You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bryan Duxbury <br...@rapleaf.com> on 2009/04/09 08:39:33 UTC
Issue distcp'ing from 0.19.2 to 0.18.3
Hey all,
I was trying to copy some data from our cluster on 0.19.2 to a new
cluster on 0.18.3 by using disctp and the hftp:// filesystem.
Everything seemed to be going fine for a few hours, but then a few
tasks failed because a few files got 500 errors when trying to be
read from the 19 cluster. As a result the job died. Now that I'm
trying to restart it, I get this error:
[rapleaf@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/ hdfs://ds-
nn2:7276/cluster-a
09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/]
09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/
cluster-a
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.net.SocketException: Unexpected end of file from
server
at sun.net.www.http.HttpClient.parseHTTPHeader
(HttpClient.java:769)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.http.HttpClient.parseHTTPHeader
(HttpClient.java:766)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream
(HttpURLConnection.java:1000)
at org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList
(HftpFileSystem.java:183)
at org.apache.hadoop.dfs.HftpFileSystem
$LsParser.getFileStatus(HftpFileSystem.java:193)
at org.apache.hadoop.dfs.HftpFileSystem.getFileStatus
(HftpFileSystem.java:222)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:588)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
I changed nothing at all between the first attempt and the subsequent
failed attempts. The only clues in the namenode log for the 19
cluster are:
2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server: Incorrect
header or version mismatch from 10.100.50.252:47733 got version 47
expected version 2
Anyone have any ideas?
-Bryan
RE: Issue distcp'ing from 0.19.2 to 0.18.3
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Bryan,
hftp://ds-nn1:7276
hdfs://ds-nn2:7276
Are you using the same port number for hftp and hdfs?
Looking at the stack trace, it seems like it failed before starting a
distcp job.
Koji
-----Original Message-----
From: Bryan Duxbury [mailto:bryan@rapleaf.com]
Sent: Wednesday, April 08, 2009 11:40 PM
To: core-user@hadoop.apache.org
Subject: Issue distcp'ing from 0.19.2 to 0.18.3
Hey all,
I was trying to copy some data from our cluster on 0.19.2 to a new
cluster on 0.18.3 by using disctp and the hftp:// filesystem.
Everything seemed to be going fine for a few hours, but then a few
tasks failed because a few files got 500 errors when trying to be
read from the 19 cluster. As a result the job died. Now that I'm
trying to restart it, I get this error:
[rapleaf@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/ hdfs://ds-
nn2:7276/cluster-a
09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/]
09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/
cluster-a
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.net.SocketException: Unexpected end of file from
server
at sun.net.www.http.HttpClient.parseHTTPHeader
(HttpClient.java:769)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.http.HttpClient.parseHTTPHeader
(HttpClient.java:766)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream
(HttpURLConnection.java:1000)
at org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList
(HftpFileSystem.java:183)
at org.apache.hadoop.dfs.HftpFileSystem
$LsParser.getFileStatus(HftpFileSystem.java:193)
at org.apache.hadoop.dfs.HftpFileSystem.getFileStatus
(HftpFileSystem.java:222)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:588)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
I changed nothing at all between the first attempt and the subsequent
failed attempts. The only clues in the namenode log for the 19
cluster are:
2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server: Incorrect
header or version mismatch from 10.100.50.252:47733 got version 47
expected version 2
Anyone have any ideas?
-Bryan
RE: reduce task specific jvm arg
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
This sounds like a reasonable request.
Created
https://issues.apache.org/jira/browse/HADOOP-5684
On our clusters, sometimes users want thin mappers and large reducers.
Koji
-----Original Message-----
From: Jun Rao [mailto:junrao@almaden.ibm.com]
Sent: Thursday, April 09, 2009 10:30 AM
To: core-user@hadoop.apache.org
Subject: reduce task specific jvm arg
Hi,
Is there a way to set jvm parameters only for reduce tasks in Hadoop?
Thanks,
Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099
junrao@almaden.ibm.com
Re: reduce task specific jvm arg
Posted by Philip Zeyliger <ph...@cloudera.com>.
There doesn't seem to be. The command line for the JVM is computed in
org.apache.hadoop.mapred.TaskRunner#run().
On Thu, Apr 9, 2009 at 10:30 AM, Jun Rao <ju...@almaden.ibm.com> wrote:
> Hi,
>
> Is there a way to set jvm parameters only for reduce tasks in Hadoop?
> Thanks,
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> junrao@almaden.ibm.com
reduce task specific jvm arg
Posted by Jun Rao <ju...@almaden.ibm.com>.
Hi,
Is there a way to set jvm parameters only for reduce tasks in Hadoop?
Thanks,
Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099
junrao@almaden.ibm.com
Re: Issue distcp'ing from 0.19.2 to 0.18.3
Posted by Bryan Duxbury <br...@rapleaf.com>.
Ah, nevermind. It turns out that I just shouldn't rely on command
history so much. I accidentally pointed the hftp:// at the actual
namenode port, not the namenode HTTP port. It appears to be starting
a regular copy again.
-Bryan
On Apr 8, 2009, at 11:57 PM, Todd Lipcon wrote:
> Hey Bryan,
>
> Any chance you can get a tshark trace on the 0.19 namenode? Maybe
> tshark -s
> 100000 -w nndump.pcap port 7276
>
> Also, are the clocks synced on the two machines? The failure of
> your distcp
> is at 23:32:39, but the namenode log message you posted was
> 23:29:09. Did
> those messages actually pop out at the same time?
>
> Thanks
> -Todd
>
> On Wed, Apr 8, 2009 at 11:39 PM, Bryan Duxbury <br...@rapleaf.com>
> wrote:
>
>> Hey all,
>>
>> I was trying to copy some data from our cluster on 0.19.2 to a new
>> cluster
>> on 0.18.3 by using disctp and the hftp:// filesystem. Everything
>> seemed to
>> be going fine for a few hours, but then a few tasks failed because
>> a few
>> files got 500 errors when trying to be read from the 19 cluster.
>> As a result
>> the job died. Now that I'm trying to restart it, I get this error:
>>
>> [rapleaf@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/
>> hdfs://ds-nn2:7276/cluster-a
>> 09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/]
>> 09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/
>> cluster-a
>> With failures, global counters are inaccurate; consider running
>> with -i
>> Copy failed: java.net.SocketException: Unexpected end of file from
>> server
>> at sun.net.www.http.HttpClient.parseHTTPHeader
>> (HttpClient.java:769)
>> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>> at sun.net.www.http.HttpClient.parseHTTPHeader
>> (HttpClient.java:766)
>> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>> at
>> sun.net.www.protocol.http.HttpURLConnection.getInputStream
>> (HttpURLConnection.java:1000)
>> at
>> org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList
>> (HftpFileSystem.java:183)
>> at
>> org.apache.hadoop.dfs.HftpFileSystem$LsParser.getFileStatus
>> (HftpFileSystem.java:193)
>> at
>> org.apache.hadoop.dfs.HftpFileSystem.getFileStatus
>> (HftpFileSystem.java:222)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
>> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:
>> 588)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
>>
>> I changed nothing at all between the first attempt and the subsequent
>> failed attempts. The only clues in the namenode log for the 19
>> cluster are:
>>
>> 2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server:
>> Incorrect header
>> or version mismatch from 10.100.50.252:47733 got version 47 expected
>> version 2
>>
>> Anyone have any ideas?
>>
>> -Bryan
>>
Re: Issue distcp'ing from 0.19.2 to 0.18.3
Posted by Todd Lipcon <to...@cloudera.com>.
Hey Bryan,
Any chance you can get a tshark trace on the 0.19 namenode? Maybe tshark -s
100000 -w nndump.pcap port 7276
Also, are the clocks synced on the two machines? The failure of your distcp
is at 23:32:39, but the namenode log message you posted was 23:29:09. Did
those messages actually pop out at the same time?
Thanks
-Todd
On Wed, Apr 8, 2009 at 11:39 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
> Hey all,
>
> I was trying to copy some data from our cluster on 0.19.2 to a new cluster
> on 0.18.3 by using disctp and the hftp:// filesystem. Everything seemed to
> be going fine for a few hours, but then a few tasks failed because a few
> files got 500 errors when trying to be read from the 19 cluster. As a result
> the job died. Now that I'm trying to restart it, I get this error:
>
> [rapleaf@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/
> hdfs://ds-nn2:7276/cluster-a
> 09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/]
> 09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/cluster-a
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.net.SocketException: Unexpected end of file from server
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:766)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1000)
> at
> org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:183)
> at
> org.apache.hadoop.dfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:193)
> at
> org.apache.hadoop.dfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:222)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:588)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
>
> I changed nothing at all between the first attempt and the subsequent
> failed attempts. The only clues in the namenode log for the 19 cluster are:
>
> 2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server: Incorrect header
> or version mismatch from 10.100.50.252:47733 got version 47 expected
> version 2
>
> Anyone have any ideas?
>
> -Bryan
>