You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Christoph Schmitz <Ch...@1und1.de> on 2011/08/15 15:40:15 UTC

Under-replication warnings for Distributed Cache?

Hi,

we're running an 8-node Hadoop cluster with CDH2. Recently, our monitoring tools caught warnings like this one when fsck'ing the HDFS:

/tmp/hadoop-tgp/mapred/system/job_201105191458_1857/job.jar:  Under replicated blk_-6996370258385460742_366223. Target Replicas is 10 but found 8 replica(s).
// Lots more like it on every file in the Distributed Cache.

Obviously, this means that the default replication factor of mapred.submit.replication=10 cannot be reached since we only have 8 datanodes. I found the place in the code (JobClient.java) where this property is consumed and used for replicating the job jar and the Distributed Cache, so I understand (kind of ;-) where the warning comes from.

Still, I have two questions: Shouldn't there be an automatic limit of mapred.submit.replication to the number of data nodes? And more generally, should I worry about this warning?
 
Thanks and best regards,

Christoph

-- 
Christoph Schmitz

1&1 Internet AG
Ernst-Frey-Straße 10 · DE-76135 Karlsruhe
Telefon: +49 721 91374-6733
christoph.schmitz@1und1.de

Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren

AW: Under-replication warnings for Distributed Cache?

Posted by Christoph Schmitz <Ch...@1und1.de>.

> Von: Harsh J [mailto:harsh@cloudera.com] 
> Gesendet: Dienstag, 16. August 2011 07:15
> An: mapreduce-user@hadoop.apache.org
> Betreff: Re: Under-replication warnings for Distributed Cache?
> 
> On Mon, Aug 15, 2011 at 7:10 PM, Christoph Schmitz
> <Ch...@1und1.de> wrote:
> > Still, I have two questions: Shouldn't there be an automatic limit of mapred.submit.replication to the number > of data nodes? And more generally, should I worry about this warning?
> 
> 1> That'd somehow bind MR to HDFS more tightly. Besides, your HDFS
> won't attack busy nodes for job jar writes/etc forms of replication.
> if they report themselves as busy (loaded) or unavailable. The idea
> sounds good though, please file a JIRA for this?

Ok, will do! Thank for the help,

Christoph

Re: Under-replication warnings for Distributed Cache?

Posted by Harsh J <ha...@cloudera.com>.

On Mon, Aug 15, 2011 at 7:10 PM, Christoph Schmitz
<Ch...@1und1.de> wrote:
> Still, I have two questions: Shouldn't there be an automatic limit of mapred.submit.replication to the number of data nodes? And more generally, should I worry about this warning?

1> That'd somehow bind MR to HDFS more tightly. Besides, your HDFS
won't attack busy nodes for job jar writes/etc forms of replication.
if they report themselves as busy (loaded) or unavailable. The idea
sounds good though, please file a JIRA for this?

2> No, there's no need to worry. You can safely lower the number if
you'd like and get rid of these under-replication issues.

-- 
Harsh J