You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Milind Bhandarkar (JIRA)" <ji...@apache.org> on 2007/10/31 21:11:50 UTC
[jira] Issue Comment Edited: (HADOOP-1917) Need configuration guides for Hadoop

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171 ] 

milindb edited comment on HADOOP-1917 at 10/31/07 1:10 PM:
---------------------------------------------------------------------

Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> whats the ant target ?
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> need some typical values here ?
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines
"slave processors" -> please use consistent terminology, prefer "worker" to "slave"
argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce
A picture would help in the overview.
In the Input and Output section, remove the use of combiner.
In the wordcount example, simplify it even more by avoiding the use of ToolRunner
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.
please provide a javadoc link to DistributedCache at the first mention


Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?

      was (Author: milindb):
    Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*


Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?
  
> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.