You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/01/27 05:21:29 UTC

[Hadoop Wiki] Update of "FAQ" by QwertyManiac

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by QwertyManiac.
The comment on this change is: Reading cluster configuration values in Job..
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=89&rev2=90

--------------------------------------------------

  == What is the Distributed Cache used for? ==
  The distributed cache is used to distribute large read-only files that are needed by map/reduce jobs to the cluster. The framework will copy the necessary files from a url (either hdfs: or http:) on to the slave node before any tasks for the job are executed on that node. The files are only copied once per job and so should not be modified by the application.
  
+ == How do I get my MapReduce Java Program to read the Cluster's set configuration and not just defaults? ==
+ The configuration property files ({core|mapred|hdfs}-site.xml) that are available in the various '''conf/''' directories of your Hadoop installation needs to be on the '''CLASSPATH''' of your Java application for it to get found and applied. Another way of ensuring that no set configuration gets overridden by any Job is to set those properties as final; for example:
+ {{{
+ <name>mapreduce.task.io.sort.mb</name>
+ <value>400</value>
+ <final>true</final>
+ }}}
+ 
+ Setting configuration properties as final is a common thing Administrators do, as is noted in the [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/conf/Configuration.html|Configuration]] API docs.
+ 
  == Can I write create/write-to hdfs files directly from map/reduce tasks? ==
  Yes. (Clearly, you want this since you need to create/write-to files other than the output-file written out by [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]].)