You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Christoph Schmitz (JIRA)" <ji...@apache.org> on 2011/08/16 08:54:27 UTC
[jira] [Created] (MAPREDUCE-2845) Default replication level
mapred.submit.replication=10 causes warnings on small clusters
Default replication level mapred.submit.replication=10 causes warnings on small clusters
----------------------------------------------------------------------------------------
Key: MAPREDUCE-2845
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2845
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client, distributed-cache
Affects Versions: 0.20.1
Environment: Cloudera CDH 2 (hadoop-0.20 0.20.1+169.127-1~lenny-cdh2)
Reporter: Christoph Schmitz
Priority: Minor
By default, the replication level for job jars, libjars and the distributed cache in general is mapred.submit.replication=10. This yields under-replication warnings for these files on small clusters (less than 10 data nodes) when using fsck ("hadoop fsck") on their HDFS.
Example on an 8-node cluster:
{quote}
/tmp/hadoop/mapred/system/job_201105191458_1857/job.jar: Under replicated blk_-6996370258385460742_366223. Target Replicas is 10 but found 8 replica(s).
{quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2845) Default replication level
mapred.submit.replication=10 causes warnings on small clusters
Posted by "Harsh J (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096751#comment-13096751 ]
Harsh J commented on MAPREDUCE-2845:
------------------------------------
The property 'mapred.submit.replication' is configurable for that reason; or is this a request to lower that value?
Indeed, this would also hamper decommissioning efforts when an admin doesn't know about it.
I think making it dynamic makes sense, but adds RPC overheads possibly, per submission.
> Default replication level mapred.submit.replication=10 causes warnings on small clusters
> ----------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2845
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2845
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: client, distributed-cache
> Affects Versions: 0.20.1
> Environment: Cloudera CDH 2 (hadoop-0.20 0.20.1+169.127-1~lenny-cdh2)
> Reporter: Christoph Schmitz
> Priority: Minor
>
> By default, the replication level for job jars, libjars and the distributed cache in general is mapred.submit.replication=10. This yields under-replication warnings for these files on small clusters (less than 10 data nodes) when using fsck ("hadoop fsck") on their HDFS.
> Example on an 8-node cluster:
> {quote}
> /tmp/hadoop/mapred/system/job_201105191458_1857/job.jar: Under replicated blk_-6996370258385460742_366223. Target Replicas is 10 but found 8 replica(s).
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2845) Default replication level
mapred.submit.replication=10 causes warnings on small clusters
Posted by "Christoph Schmitz (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115434#comment-13115434 ]
Christoph Schmitz commented on MAPREDUCE-2845:
----------------------------------------------
My original intent was to have the system automatically do something like mapred.submit.replication = min(10, number_of_nodes), so that job submission would not require setting mapred.submit.replication manually to avoid the under-replication warnings (on small clusters).
I see your point, though, that that would cause depencencies between HDFS and Map/Reduce.
> Default replication level mapred.submit.replication=10 causes warnings on small clusters
> ----------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2845
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2845
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: client, distributed-cache
> Affects Versions: 0.20.1
> Environment: Cloudera CDH 2 (hadoop-0.20 0.20.1+169.127-1~lenny-cdh2)
> Reporter: Christoph Schmitz
> Priority: Minor
>
> By default, the replication level for job jars, libjars and the distributed cache in general is mapred.submit.replication=10. This yields under-replication warnings for these files on small clusters (less than 10 data nodes) when using fsck ("hadoop fsck") on their HDFS.
> Example on an 8-node cluster:
> {quote}
> /tmp/hadoop/mapred/system/job_201105191458_1857/job.jar: Under replicated blk_-6996370258385460742_366223. Target Replicas is 10 but found 8 replica(s).
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira