You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Korb, Michael [USA]" <Ko...@bah.com> on 2011/02/07 16:51:40 UTC
Hadoop XML Error
I am running two instances of Hadoop on a cluster and want to copy all the data from hadoop1 to the updated hadoop2. From hadoop2, I am running the command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/" where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of hadoop2. I get the following error:
11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
[Fatal Error] :1:215: XML document structures must start and end within the same entity.
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: invalid xml directory content
at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
at org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
... 9 more
I am fairly certain that none of the XML files are malformed or corrupted. This thread (http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html) discusses a similar problem caused by file permissions but doesn't seem to offer a solution. Any help would be appreciated.
Thanks,
Mike
Re: Hadoop XML Error
Posted by Ted Dunning <td...@maprtech.com>.
This is due to the security API not being available. You are crossing from
a cluster with security to one without and that is causing confusion.
Presumably your client assumes that it is available and your hadoop library
doesn't provide it.
Check your class path very carefully looking for version assumptions and
confusions.
On Mon, Feb 7, 2011 at 11:43 AM, Korb, Michael [USA]
<Ko...@bah.com>wrote:
> We're migrating from CDH3b3 to a recent build of 0.20-append published by
> Ryan Rawson. This isn't something covered by normal upgrade scripts. I've
> tried several commands with different protocols and port numbers, but now
> keep getting the same error:
>
> 11/02/07 14:35:06 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 14:35:06 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
> Has anyone seen this before? What might be causing it?
>
> Thanks,
> Mike
>
>
> ________________________________________
> From: Xavier Stevens [xstevens@mozilla.com]
> Sent: Monday, February 07, 2011 1:47 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> You don't need to distcp to upgrade a cluster. You just need to go
> through the upgrade process. Bumping from 0.20.2 to 0.20.3 you might
> not even need to do anything other than stop the cluster processes, and
> then restart them using the 0.20.3 install.
>
> Here's a link to the upgrade and rollback docs:
>
> http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Upgrade+and+Rollback
>
>
> -Xavier
>
>
> On 2/7/11 10:22 AM, Korb, Michael [USA] wrote:
> > Xavier,
> >
> > Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the
> same cluster. I'm trying to distcp everything from the 0.20.2 instance over
> to the 0.20.3 instance, without any luck yet.
> >
> > Mike
> > ________________________________________
> > From: Xavier Stevens [xstevens@mozilla.com]
> > Sent: Monday, February 07, 2011 1:20 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Hadoop XML Error
> >
> > Mike,
> >
> > Are you just trying to upgrade then? I've never heard of anyone trying
> > to run two versions of hadoop on the same cluster. I'm don't think
> > that's even possible, but maybe someone else knows.
> >
> > -Xavier
> >
> >
> > On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
> >> Xavier,
> >>
> >> Both instances of Hadoop are running on the same cluster. I tried the
> command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/
> hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on
> mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting
> this:
> >>
> >> 11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> >> 11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> >> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> >> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> >>
> >> Thanks,
> >> Mike
> >> ________________________________________
> >> From: Xavier Stevens [xstevens@mozilla.com]
> >> Sent: Monday, February 07, 2011 12:56 PM
> >> To: common-user@hadoop.apache.org
> >> Subject: Re: Hadoop XML Error
> >>
> >> Mike,
> >>
> >> I've seen this when a directory has been removed or is missing from the
> >> time distcp starting stating the source files. You'll probably want to
> >> make sure that no code or person is messing with the filesystem during
> >> your copy. I would make sure you only have one version of hadoop
> >> installed on your destination cluster. Also you should use hdfs as the
> >> destination protocol and run the command as the hdfs user if you're
> >> using hadoop security.
> >>
> >> Example (Running on destination cluster):
> >>
> >> sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
> >> hftp://mc00001:50070/ hdfs://mc00000:8020/
> >>
> >> Cheers,
> >>
> >>
> >> -Xavier
> >>
> >>
> >> On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
> >>> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the
> DistCp Guide but I think I know the problem. I'm trying to run the command
> on the destination cluster, but when I call hadoop, I think the path is set
> to run the hadoop1 executable. So I tried going to the hadoop2 install and
> running it with "./hadoop distcp -update hftp://mc00001:50070/
> hdfs://mc00000:55310/" but now I get this error:
> >>>
> >>> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> >>> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
> >>> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> >>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> >>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> >>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> >>>
> >>>
> >>> ________________________________________
> >>> From: Sonal Goyal [sonalgoyal4@gmail.com]
> >>> Sent: Monday, February 07, 2011 12:11 PM
> >>> To: common-user@hadoop.apache.org
> >>> Subject: Re: Hadoop XML Error
> >>>
> >>> Mike,
> >>>
> >>> This error is not related to malformed XML files etc you are trying to
> copy,
> >>> but because for some reason, the source or destination listing can not
> be
> >>> retrieved/parsed. Are you trying to copy between diff versions of
> clusters?
> >>> As far as I know, your destination should be writable, distcp should be
> run
> >>> from the destination cluster. See more here:
> >>> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
> >>>
> >>> Let us know how it goes.
> >>>
> >>> Thanks and Regards,
> >>> Sonal
> >>> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> >>> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho
> >
> >>> Nube Technologies <http://www.nubetech.co>
> >>>
> >>> <http://in.linkedin.com/in/sonalgoyal>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <
> Korb_Michael@bah.com>wrote:
> >>>
> >>>> I am running two instances of Hadoop on a cluster and want to copy all
> the
> >>>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running
> the
> >>>> command "hadoop distcp -update hftp://mc00001:50070/
> hftp://mc00000:50070/"
> >>>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode
> of
> >>>> hadoop2. I get the following error:
> >>>>
> >>>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> >>>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
> >>>> [Fatal Error] :1:215: XML document structures must start and end
> within the
> >>>> same entity.
> >>>> With failures, global counters are inaccurate; consider running with
> -i
> >>>> Copy failed: java.io.IOException: invalid xml directory content
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
> >>>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
> >>>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
> >>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
> >>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> >>>> Caused by: org.xml.sax.SAXParseException: XML document structures must
> >>>> start and end within the same entity.
> >>>> at
> >>>>
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
> >>>> ... 9 more
> >>>>
> >>>> I am fairly certain that none of the XML files are malformed or
> corrupted.
> >>>> This thread (
> >>>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
> >>>> discusses a similar problem caused by file permissions but doesn't
> seem to
> >>>> offer a solution. Any help would be appreciated.
> >>>>
> >>>> Thanks,
> >>>> Mike
> >>>>
>
RE: Hadoop XML Error
Posted by "Korb, Michael [USA]" <Ko...@bah.com>.
We're migrating from CDH3b3 to a recent build of 0.20-append published by Ryan Rawson. This isn't something covered by normal upgrade scripts. I've tried several commands with different protocols and port numbers, but now keep getting the same error:
11/02/07 14:35:06 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
11/02/07 14:35:06 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Has anyone seen this before? What might be causing it?
Thanks,
Mike
________________________________________
From: Xavier Stevens [xstevens@mozilla.com]
Sent: Monday, February 07, 2011 1:47 PM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop XML Error
You don't need to distcp to upgrade a cluster. You just need to go
through the upgrade process. Bumping from 0.20.2 to 0.20.3 you might
not even need to do anything other than stop the cluster processes, and
then restart them using the 0.20.3 install.
Here's a link to the upgrade and rollback docs:
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Upgrade+and+Rollback
-Xavier
On 2/7/11 10:22 AM, Korb, Michael [USA] wrote:
> Xavier,
>
> Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the same cluster. I'm trying to distcp everything from the 0.20.2 instance over to the 0.20.3 instance, without any luck yet.
>
> Mike
> ________________________________________
> From: Xavier Stevens [xstevens@mozilla.com]
> Sent: Monday, February 07, 2011 1:20 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> Mike,
>
> Are you just trying to upgrade then? I've never heard of anyone trying
> to run two versions of hadoop on the same cluster. I'm don't think
> that's even possible, but maybe someone else knows.
>
> -Xavier
>
>
> On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
>> Xavier,
>>
>> Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:
>>
>> 11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>> 11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
>> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>> Thanks,
>> Mike
>> ________________________________________
>> From: Xavier Stevens [xstevens@mozilla.com]
>> Sent: Monday, February 07, 2011 12:56 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Hadoop XML Error
>>
>> Mike,
>>
>> I've seen this when a directory has been removed or is missing from the
>> time distcp starting stating the source files. You'll probably want to
>> make sure that no code or person is messing with the filesystem during
>> your copy. I would make sure you only have one version of hadoop
>> installed on your destination cluster. Also you should use hdfs as the
>> destination protocol and run the command as the hdfs user if you're
>> using hadoop security.
>>
>> Example (Running on destination cluster):
>>
>> sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
>> hftp://mc00001:50070/ hdfs://mc00000:8020/
>>
>> Cheers,
>>
>>
>> -Xavier
>>
>>
>> On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
>>> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
>>>
>>> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>>> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
>>> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
>>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>
>>>
>>> ________________________________________
>>> From: Sonal Goyal [sonalgoyal4@gmail.com]
>>> Sent: Monday, February 07, 2011 12:11 PM
>>> To: common-user@hadoop.apache.org
>>> Subject: Re: Hadoop XML Error
>>>
>>> Mike,
>>>
>>> This error is not related to malformed XML files etc you are trying to copy,
>>> but because for some reason, the source or destination listing can not be
>>> retrieved/parsed. Are you trying to copy between diff versions of clusters?
>>> As far as I know, your destination should be writable, distcp should be run
>>> from the destination cluster. See more here:
>>> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
>>>
>>> Let us know how it goes.
>>>
>>> Thanks and Regards,
>>> Sonal
>>> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
>>> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
>>> Nube Technologies <http://www.nubetech.co>
>>>
>>> <http://in.linkedin.com/in/sonalgoyal>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
>>>
>>>> I am running two instances of Hadoop on a cluster and want to copy all the
>>>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
>>>> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
>>>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
>>>> hadoop2. I get the following error:
>>>>
>>>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>>>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
>>>> [Fatal Error] :1:215: XML document structures must start and end within the
>>>> same entity.
>>>> With failures, global counters are inaccurate; consider running with -i
>>>> Copy failed: java.io.IOException: invalid xml directory content
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
>>>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
>>>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
>>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
>>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>> Caused by: org.xml.sax.SAXParseException: XML document structures must
>>>> start and end within the same entity.
>>>> at
>>>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
>>>> ... 9 more
>>>>
>>>> I am fairly certain that none of the XML files are malformed or corrupted.
>>>> This thread (
>>>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
>>>> discusses a similar problem caused by file permissions but doesn't seem to
>>>> offer a solution. Any help would be appreciated.
>>>>
>>>> Thanks,
>>>> Mike
>>>>
Re: Hadoop XML Error
Posted by Xavier Stevens <xs...@mozilla.com>.
You don't need to distcp to upgrade a cluster. You just need to go
through the upgrade process. Bumping from 0.20.2 to 0.20.3 you might
not even need to do anything other than stop the cluster processes, and
then restart them using the 0.20.3 install.
Here's a link to the upgrade and rollback docs:
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Upgrade+and+Rollback
-Xavier
On 2/7/11 10:22 AM, Korb, Michael [USA] wrote:
> Xavier,
>
> Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the same cluster. I'm trying to distcp everything from the 0.20.2 instance over to the 0.20.3 instance, without any luck yet.
>
> Mike
> ________________________________________
> From: Xavier Stevens [xstevens@mozilla.com]
> Sent: Monday, February 07, 2011 1:20 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> Mike,
>
> Are you just trying to upgrade then? I've never heard of anyone trying
> to run two versions of hadoop on the same cluster. I'm don't think
> that's even possible, but maybe someone else knows.
>
> -Xavier
>
>
> On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
>> Xavier,
>>
>> Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:
>>
>> 11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>> 11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
>> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>> Thanks,
>> Mike
>> ________________________________________
>> From: Xavier Stevens [xstevens@mozilla.com]
>> Sent: Monday, February 07, 2011 12:56 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Hadoop XML Error
>>
>> Mike,
>>
>> I've seen this when a directory has been removed or is missing from the
>> time distcp starting stating the source files. You'll probably want to
>> make sure that no code or person is messing with the filesystem during
>> your copy. I would make sure you only have one version of hadoop
>> installed on your destination cluster. Also you should use hdfs as the
>> destination protocol and run the command as the hdfs user if you're
>> using hadoop security.
>>
>> Example (Running on destination cluster):
>>
>> sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
>> hftp://mc00001:50070/ hdfs://mc00000:8020/
>>
>> Cheers,
>>
>>
>> -Xavier
>>
>>
>> On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
>>> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
>>>
>>> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>>> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
>>> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
>>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>
>>>
>>> ________________________________________
>>> From: Sonal Goyal [sonalgoyal4@gmail.com]
>>> Sent: Monday, February 07, 2011 12:11 PM
>>> To: common-user@hadoop.apache.org
>>> Subject: Re: Hadoop XML Error
>>>
>>> Mike,
>>>
>>> This error is not related to malformed XML files etc you are trying to copy,
>>> but because for some reason, the source or destination listing can not be
>>> retrieved/parsed. Are you trying to copy between diff versions of clusters?
>>> As far as I know, your destination should be writable, distcp should be run
>>> from the destination cluster. See more here:
>>> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
>>>
>>> Let us know how it goes.
>>>
>>> Thanks and Regards,
>>> Sonal
>>> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
>>> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
>>> Nube Technologies <http://www.nubetech.co>
>>>
>>> <http://in.linkedin.com/in/sonalgoyal>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
>>>
>>>> I am running two instances of Hadoop on a cluster and want to copy all the
>>>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
>>>> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
>>>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
>>>> hadoop2. I get the following error:
>>>>
>>>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>>>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
>>>> [Fatal Error] :1:215: XML document structures must start and end within the
>>>> same entity.
>>>> With failures, global counters are inaccurate; consider running with -i
>>>> Copy failed: java.io.IOException: invalid xml directory content
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
>>>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
>>>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
>>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
>>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>> Caused by: org.xml.sax.SAXParseException: XML document structures must
>>>> start and end within the same entity.
>>>> at
>>>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>>>> at
>>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
>>>> ... 9 more
>>>>
>>>> I am fairly certain that none of the XML files are malformed or corrupted.
>>>> This thread (
>>>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
>>>> discusses a similar problem caused by file permissions but doesn't seem to
>>>> offer a solution. Any help would be appreciated.
>>>>
>>>> Thanks,
>>>> Mike
>>>>
RE: Hadoop XML Error
Posted by "Korb, Michael [USA]" <Ko...@bah.com>.
Xavier,
Yes, I'm trying to upgrade from 0.20.2 to 0.20.3. Both are running on the same cluster. I'm trying to distcp everything from the 0.20.2 instance over to the 0.20.3 instance, without any luck yet.
Mike
________________________________________
From: Xavier Stevens [xstevens@mozilla.com]
Sent: Monday, February 07, 2011 1:20 PM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop XML Error
Mike,
Are you just trying to upgrade then? I've never heard of anyone trying
to run two versions of hadoop on the same cluster. I'm don't think
that's even possible, but maybe someone else knows.
-Xavier
On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
> Xavier,
>
> Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:
>
> 11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
> Thanks,
> Mike
> ________________________________________
> From: Xavier Stevens [xstevens@mozilla.com]
> Sent: Monday, February 07, 2011 12:56 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> Mike,
>
> I've seen this when a directory has been removed or is missing from the
> time distcp starting stating the source files. You'll probably want to
> make sure that no code or person is messing with the filesystem during
> your copy. I would make sure you only have one version of hadoop
> installed on your destination cluster. Also you should use hdfs as the
> destination protocol and run the command as the hdfs user if you're
> using hadoop security.
>
> Example (Running on destination cluster):
>
> sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
> hftp://mc00001:50070/ hdfs://mc00000:8020/
>
> Cheers,
>
>
> -Xavier
>
>
> On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
>> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
>>
>> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
>> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>>
>> ________________________________________
>> From: Sonal Goyal [sonalgoyal4@gmail.com]
>> Sent: Monday, February 07, 2011 12:11 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Hadoop XML Error
>>
>> Mike,
>>
>> This error is not related to malformed XML files etc you are trying to copy,
>> but because for some reason, the source or destination listing can not be
>> retrieved/parsed. Are you trying to copy between diff versions of clusters?
>> As far as I know, your destination should be writable, distcp should be run
>> from the destination cluster. See more here:
>> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
>>
>> Let us know how it goes.
>>
>> Thanks and Regards,
>> Sonal
>> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
>> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>>
>> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
>>
>>> I am running two instances of Hadoop on a cluster and want to copy all the
>>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
>>> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
>>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
>>> hadoop2. I get the following error:
>>>
>>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
>>> [Fatal Error] :1:215: XML document structures must start and end within the
>>> same entity.
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.IOException: invalid xml directory content
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
>>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
>>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>> Caused by: org.xml.sax.SAXParseException: XML document structures must
>>> start and end within the same entity.
>>> at
>>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
>>> ... 9 more
>>>
>>> I am fairly certain that none of the XML files are malformed or corrupted.
>>> This thread (
>>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
>>> discusses a similar problem caused by file permissions but doesn't seem to
>>> offer a solution. Any help would be appreciated.
>>>
>>> Thanks,
>>> Mike
>>>
Re: Hadoop XML Error
Posted by Xavier Stevens <xs...@mozilla.com>.
Mike,
Are you just trying to upgrade then? I've never heard of anyone trying
to run two versions of hadoop on the same cluster. I'm don't think
that's even possible, but maybe someone else knows.
-Xavier
On 2/7/11 10:03 AM, Korb, Michael [USA] wrote:
> Xavier,
>
> Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:
>
> 11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
> Thanks,
> Mike
> ________________________________________
> From: Xavier Stevens [xstevens@mozilla.com]
> Sent: Monday, February 07, 2011 12:56 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> Mike,
>
> I've seen this when a directory has been removed or is missing from the
> time distcp starting stating the source files. You'll probably want to
> make sure that no code or person is messing with the filesystem during
> your copy. I would make sure you only have one version of hadoop
> installed on your destination cluster. Also you should use hdfs as the
> destination protocol and run the command as the hdfs user if you're
> using hadoop security.
>
> Example (Running on destination cluster):
>
> sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
> hftp://mc00001:50070/ hdfs://mc00000:8020/
>
> Cheers,
>
>
> -Xavier
>
>
> On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
>> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
>>
>> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
>> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>>
>> ________________________________________
>> From: Sonal Goyal [sonalgoyal4@gmail.com]
>> Sent: Monday, February 07, 2011 12:11 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Hadoop XML Error
>>
>> Mike,
>>
>> This error is not related to malformed XML files etc you are trying to copy,
>> but because for some reason, the source or destination listing can not be
>> retrieved/parsed. Are you trying to copy between diff versions of clusters?
>> As far as I know, your destination should be writable, distcp should be run
>> from the destination cluster. See more here:
>> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
>>
>> Let us know how it goes.
>>
>> Thanks and Regards,
>> Sonal
>> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
>> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>>
>> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
>>
>>> I am running two instances of Hadoop on a cluster and want to copy all the
>>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
>>> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
>>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
>>> hadoop2. I get the following error:
>>>
>>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
>>> [Fatal Error] :1:215: XML document structures must start and end within the
>>> same entity.
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.IOException: invalid xml directory content
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
>>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
>>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>> Caused by: org.xml.sax.SAXParseException: XML document structures must
>>> start and end within the same entity.
>>> at
>>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>>> at
>>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
>>> ... 9 more
>>>
>>> I am fairly certain that none of the XML files are malformed or corrupted.
>>> This thread (
>>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
>>> discusses a similar problem caused by file permissions but doesn't seem to
>>> offer a solution. Any help would be appreciated.
>>>
>>> Thanks,
>>> Mike
>>>
RE: Hadoop XML Error
Posted by "Korb, Michael [USA]" <Ko...@bah.com>.
Xavier,
Both instances of Hadoop are running on the same cluster. I tried the command "sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310" from the hadoop2 bin directory (the 0.20.3 install) on mc00000 (the port 55310 is specified in core-site.xml). Now I'm getting this:
11/02/07 13:03:14 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
11/02/07 13:03:14 INFO tools.DistCp: destPath=hdfs://mc00000:55310
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Thanks,
Mike
________________________________________
From: Xavier Stevens [xstevens@mozilla.com]
Sent: Monday, February 07, 2011 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop XML Error
Mike,
I've seen this when a directory has been removed or is missing from the
time distcp starting stating the source files. You'll probably want to
make sure that no code or person is messing with the filesystem during
your copy. I would make sure you only have one version of hadoop
installed on your destination cluster. Also you should use hdfs as the
destination protocol and run the command as the hdfs user if you're
using hadoop security.
Example (Running on destination cluster):
sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
hftp://mc00001:50070/ hdfs://mc00000:8020/
Cheers,
-Xavier
On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
>
> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
>
> ________________________________________
> From: Sonal Goyal [sonalgoyal4@gmail.com]
> Sent: Monday, February 07, 2011 12:11 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> Mike,
>
> This error is not related to malformed XML files etc you are trying to copy,
> but because for some reason, the source or destination listing can not be
> retrieved/parsed. Are you trying to copy between diff versions of clusters?
> As far as I know, your destination should be writable, distcp should be run
> from the destination cluster. See more here:
> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
>
> Let us know how it goes.
>
> Thanks and Regards,
> Sonal
> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
>
>> I am running two instances of Hadoop on a cluster and want to copy all the
>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
>> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
>> hadoop2. I get the following error:
>>
>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
>> [Fatal Error] :1:215: XML document structures must start and end within the
>> same entity.
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.IOException: invalid xml directory content
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>> Caused by: org.xml.sax.SAXParseException: XML document structures must
>> start and end within the same entity.
>> at
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
>> ... 9 more
>>
>> I am fairly certain that none of the XML files are malformed or corrupted.
>> This thread (
>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
>> discusses a similar problem caused by file permissions but doesn't seem to
>> offer a solution. Any help would be appreciated.
>>
>> Thanks,
>> Mike
>>
Re: Hadoop XML Error
Posted by Xavier Stevens <xs...@mozilla.com>.
Mike,
I've seen this when a directory has been removed or is missing from the
time distcp starting stating the source files. You'll probably want to
make sure that no code or person is messing with the filesystem during
your copy. I would make sure you only have one version of hadoop
installed on your destination cluster. Also you should use hdfs as the
destination protocol and run the command as the hdfs user if you're
using hadoop security.
Example (Running on destination cluster):
sudo -u hdfs /usr/lib/hadoop-0.20.3/bin/hadoop distcp -update
hftp://mc00001:50070/ hdfs://mc00000:8020/
Cheers,
-Xavier
On 2/7/11 9:39 AM, Korb, Michael [USA] wrote:
> I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
>
> 11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
>
> ________________________________________
> From: Sonal Goyal [sonalgoyal4@gmail.com]
> Sent: Monday, February 07, 2011 12:11 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop XML Error
>
> Mike,
>
> This error is not related to malformed XML files etc you are trying to copy,
> but because for some reason, the source or destination listing can not be
> retrieved/parsed. Are you trying to copy between diff versions of clusters?
> As far as I know, your destination should be writable, distcp should be run
> from the destination cluster. See more here:
> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
>
> Let us know how it goes.
>
> Thanks and Regards,
> Sonal
> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
>
>> I am running two instances of Hadoop on a cluster and want to copy all the
>> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
>> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
>> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
>> hadoop2. I get the following error:
>>
>> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
>> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
>> [Fatal Error] :1:215: XML document structures must start and end within the
>> same entity.
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.IOException: invalid xml directory content
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
>> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
>> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>> Caused by: org.xml.sax.SAXParseException: XML document structures must
>> start and end within the same entity.
>> at
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>> at
>> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
>> ... 9 more
>>
>> I am fairly certain that none of the XML files are malformed or corrupted.
>> This thread (
>> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
>> discusses a similar problem caused by file permissions but doesn't seem to
>> offer a solution. Any help would be appreciated.
>>
>> Thanks,
>> Mike
>>
RE: Hadoop XML Error
Posted by "Korb, Michael [USA]" <Ko...@bah.com>.
I'm trying to copy from 0.20.2 to 0.20.3. I was trying to follow the DistCp Guide but I think I know the problem. I'm trying to run the command on the destination cluster, but when I call hadoop, I think the path is set to run the hadoop1 executable. So I tried going to the hadoop2 install and running it with "./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/" but now I get this error:
11/02/07 12:38:09 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
11/02/07 12:38:09 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
________________________________________
From: Sonal Goyal [sonalgoyal4@gmail.com]
Sent: Monday, February 07, 2011 12:11 PM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop XML Error
Mike,
This error is not related to malformed XML files etc you are trying to copy,
but because for some reason, the source or destination listing can not be
retrieved/parsed. Are you trying to copy between diff versions of clusters?
As far as I know, your destination should be writable, distcp should be run
from the destination cluster. See more here:
http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
Let us know how it goes.
Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
> I am running two instances of Hadoop on a cluster and want to copy all the
> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
> hadoop2. I get the following error:
>
> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
> [Fatal Error] :1:215: XML document structures must start and end within the
> same entity.
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: invalid xml directory content
> at
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
> at
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
> at
> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> Caused by: org.xml.sax.SAXParseException: XML document structures must
> start and end within the same entity.
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> at
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
> ... 9 more
>
> I am fairly certain that none of the XML files are malformed or corrupted.
> This thread (
> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
> discusses a similar problem caused by file permissions but doesn't seem to
> offer a solution. Any help would be appreciated.
>
> Thanks,
> Mike
>
Re: Hadoop XML Error
Posted by Sonal Goyal <so...@gmail.com>.
Mike,
This error is not related to malformed XML files etc you are trying to copy,
but because for some reason, the source or destination listing can not be
retrieved/parsed. Are you trying to copy between diff versions of clusters?
As far as I know, your destination should be writable, distcp should be run
from the destination cluster. See more here:
http://hadoop.apache.org/common/docs/r0.20.2/distcp.html
Let us know how it goes.
Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Mon, Feb 7, 2011 at 9:21 PM, Korb, Michael [USA] <Ko...@bah.com>wrote:
> I am running two instances of Hadoop on a cluster and want to copy all the
> data from hadoop1 to the updated hadoop2. From hadoop2, I am running the
> command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/"
> where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of
> hadoop2. I get the following error:
>
> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
> [Fatal Error] :1:215: XML document structures must start and end within the
> same entity.
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: invalid xml directory content
> at
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
> at
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
> at
> org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> Caused by: org.xml.sax.SAXParseException: XML document structures must
> start and end within the same entity.
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> at
> org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
> ... 9 more
>
> I am fairly certain that none of the XML files are malformed or corrupted.
> This thread (
> http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html)
> discusses a similar problem caused by file permissions but doesn't seem to
> offer a solution. Any help would be appreciated.
>
> Thanks,
> Mike
>
Re: Hadoop XML Error
Posted by Xavier Stevens <xs...@mozilla.com>.
Mike,
I've seen this when a directory has been removed or is missing from the
time distcp starting stating the source files. You'll probably want to
make sure that no code or person is messing with the filesystem during
your copy. Also you should use hdfs as the destination protocol.
Cheers,
-Xavier
On 2/7/11 7:51 AM, Korb, Michael [USA] wrote:
> I am running two instances of Hadoop on a cluster and want to copy all the data from hadoop1 to the updated hadoop2. From hadoop2, I am running the command "hadoop distcp -update hftp://mc00001:50070/ hftp://mc00000:50070/" where mc00001 is the namenode of hadoop1 and mc00000 is the namenode of hadoop2. I get the following error:
>
> 11/02/07 10:12:31 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> 11/02/07 10:12:31 INFO tools.DistCp: destPath=hftp://mc00000:50070/
> [Fatal Error] :1:215: XML document structures must start and end within the same entity.
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: invalid xml directory content
> at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:350)
> at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:355)
> at org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:384)
> at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1227)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
> at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:344)
> ... 9 more
>
> I am fairly certain that none of the XML files are malformed or corrupted. This thread (http://www.mail-archive.com/core-dev@hadoop.apache.org/msg18064.html) discusses a similar problem caused by file permissions but doesn't seem to offer a solution. Any help would be appreciated.
>
> Thanks,
> Mike