You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Johan Oskarsson (JIRA)" <ji...@apache.org> on 2007/12/07 15:04:43 UTC
[jira] Created: (HADOOP-2379) Distcp setup is slow
Distcp setup is slow
--------------------
Key: HADOOP-2379
URL: https://issues.apache.org/jira/browse/HADOOP-2379
Project: Hadoop
Issue Type: Improvement
Components: dfs
Affects Versions: 0.14.3
Environment: from 35 node cluster to 10 node cluster
Reporter: Johan Oskarsson
Priority: Minor
When starting a distcp the setup phase often takes a very long time. For example during the distcp I just ran the setup phase took 15 minutes and the actual copy 3 minutes. Could this be improved? Or at least a progress bar added so the user doesn't think it stalled.
I also often see exceptions like this in the setup, but the distcp finishes eventually.
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1672)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1744)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:774)
at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.setup(CopyFiles.java:351)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:773)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:854)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:864)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2379) Distcp setup is slow
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549515 ]
Doug Cutting commented on HADOOP-2379:
--------------------------------------
This may be improved by HADOOP-2129, which improves the caching of FileStatus in DFS. You might try applying that patch to 0.14.3...
> Distcp setup is slow
> --------------------
>
> Key: HADOOP-2379
> URL: https://issues.apache.org/jira/browse/HADOOP-2379
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.14.3
> Environment: from 35 node cluster to 10 node cluster
> Reporter: Johan Oskarsson
> Priority: Minor
>
> When starting a distcp the setup phase often takes a very long time. For example during the distcp I just ran the setup phase took 15 minutes and the actual copy 3 minutes. Could this be improved? Or at least a progress bar added so the user doesn't think it stalled.
> I also often see exceptions like this in the setup, but the distcp finishes eventually.
> java.io.EOFException
> at java.io.DataInputStream.readShort(DataInputStream.java:298)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1672)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1744)
> at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:774)
> at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.setup(CopyFiles.java:351)
> at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:773)
> at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:854)
> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
> at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:864)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.