You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2013/04/13 16:32:16 UTC

[jira] [Commented] (HADOOP-9475) Distcp issue

    [ https://issues.apache.org/jira/browse/HADOOP-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631062#comment-13631062 ] 

Steve Loughran commented on HADOOP-9475:
----------------------------------------

I'm afraid I'm gong to have to close this as an invalid issue unless you can show that there's a bug in distCP that surfaces on your infrastructure

http://wiki.apache.org/hadoop/InvalidJiraIssues

# DistCp works for everybody else, fast enough to bring down network links between sites if you aren't careful.
# It is implemented as an MR job run on the source cluster, where mappers copy files.

If it doesn't work for you, then there's probably something wrong with your cluster or network
* the bandwidth between clusters is lower than you expect
* you are limited by the no. of mappers you can run with distCP (distcp conf or cluster setup)
* you have lots and lots of small files

If you can show that your cluster and the network has the capacity to copy large files (hint: use one of the many linux command line network bandwidth test tools to measure that bandwidth before going near Hadoop), then consider filing a bug. Even there, as nobody else is seeing it, you are going to have to be the person to debug & fix it. 
                
> Distcp issue
> ------------
>
>                 Key: HADOOP-9475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9475
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Sambit Sahoo
>
> 2013-04-13 05:11:43,327 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
> 2013-04-13 05:11:43,439 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
> 2013-04-13 05:11:43,750 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
> 2013-04-13 05:11:43,981 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
> 2013-04-13 05:31:08,282 INFO org.apache.hadoop.mapred.Task: Task:attempt_201302011155_224614_m_000009_0 is done. And is in the process of commiting
> 2013-04-13 05:31:09,359 INFO org.apache.hadoop.mapred.Task: Task attempt_201302011155_224614_m_000009_0 is allowed to commit now
> 2013-04-13 05:31:09,937 INFO org.apache.hadoop.mapred.FileOutputCommitter: Saved output of task 'attempt_201302011155_224614_m_000009_0' to /tmp/_distcp_logs_spti36
> 2013-04-13 05:31:09,939 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201302011155_224614_m_000009_0' done.
> 2013-04-13 05:31:09,942 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> I am facing some delay during disctcp from one cluster to another.
> Here i am copying snappy compressed data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira