You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Sameer Paranjpye (JIRA)" <ji...@apache.org> on 2007/09/18 21:27:43 UTC

[jira] Created: (HADOOP-1917) Need configuration guides for Hadoop

Need configuration guides for Hadoop
------------------------------------

                 Key: HADOOP-1917
                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
             Project: Hadoop
          Issue Type: Improvement
          Components: conf
    Affects Versions: 0.14.1
            Reporter: Sameer Paranjpye
            Priority: Critical
             Fix For: 0.15.0


We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.

We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Patch Available  (was: Reopened)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171 ] 

Milind Bhandarkar commented on HADOOP-1917:
-------------------------------------------

Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*


Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_4_20071105.patch

Thanks to Nigel, Milind and Corrine for their extensive feedback, much appreciated!

Some comments:

@Nigel
  * I've check api/index.html?<> works, I couldn't get forrest to accept any other form of urls for the javadocs (long story!). I'll gladly change if someone knows a better way. *smile*
  * There is some coverage of  the {{combiner}} in the {{Mapper}} section.

@Milind
   * Lets keep a single tutorial, which covers all details, for now. Having one with only the example doesn't seem right. We can always revisit this later...

Ok, here is another go at it...

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538660 ] 

Nigel Daley commented on HADOOP-1917:
-------------------------------------

Comments (nits) on setup.xml

replace <strong> with <code> in a number of places (environment variables, commands, etc)

"Installing a hadoop cluster is just a simple step on ensuring that the software is distributed to all the machines in the cluster." -> "Installing a Hadoop cluster typically involves unpacking the software on all the machines in the cluster."

"found in" -> "found in the"

"the hadoop scripts provided, found in the" -> "the Hadoop scripts found in the"

"the hadoop-daemons" -> "the Hadoop daemons"

"Environment of Hadoop Daemons" -> "Setting the Hadoop Daemons Environment"

"necessary configuration parameters" -> "necessary <em>configuration parameters</em>"

"Other possible knobs to tweak are:" -> "Other useful configuration parameters to customize are:"

State whether or not HADOOP_LOG_DIR already has to exist or whether it will be created if it doesn't.

Do you need to define somewhere what "Hadoop daemons" are?

"machines act as both" -> "machines act as both a"

"i.e. the" -> "and are referred to as"

"Hadoop Daemons' logs" -> "Hadoop daemons logging configuration"

"Bootup Hadoop" -> "Startup Hadoop"

"is the small matter of" -> "involves"

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Component/s:     (was: conf)
                 documentation

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Fix Version/s:     (was: 0.15.0)
                   0.16.0

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment:     (was: HADOOP-1917_7_20071110.patch)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541524 ] 

Hadoop QA commented on HADOOP-1917:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369271/HADOOP-1917_7_20071110.patch
against trunk revision r593708.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/console

This message is automatically generated.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540134 ] 

Nigel Daley commented on HADOOP-1917:
-------------------------------------

Ok, final set of comments on the tutorial:

Application typically implement -> 
Applications typically implement

These represent the core -> 
These form the core

<code>Mapper</code> implementations can access the <code>JobConf</code> ... -> 
<code>Mapper</code> implementations are passed the <code>JobConf</code> via the ... (discuss the ordering guarantees of the calls made to the Mapper methods: configure, map, close)

"de-initialization" -> "finalization" or "tear down" or "cleanup"

(the above 2 comments also apply to the Reducer section)

"The framework then calls" makes it sound like you were previously talking about the sequencing of calls (which I don't think you were)

"to report progress, status, counters and so on, or just indicate that they are alive" -> "to report progress, status, and counters" (it looks like that's all you can do with the Reporter interface)

(the above comment also apply to the Reducer section)

"The grouped <code>Mapper</code> outputs are partitioned per <code>Reducer</code>" (I think this concept needs more explanation as it's not obvious to the new user)

which is only a hint -> 
which only provides a hint

conjunction to simulate -> 
conjunction to simulate a

If equivalence rules for keys while grouping the intermediates are different from those for grouping keys before reduction ->
If equivalence rules for grouping the intermediates keys are required to be different from those for grouping keys before reduction

<em>not re-sorted</em> -> 
<em>not sorted</em> by the framework

<code>zero</code> -> <em>zero</em>

is sent for reduction -> is sent to for reduction

possibly link to HashPartitioner javadoc

insignificant amount of time -> significant amount of time

even to <code>zero</code> -> even to <em>zero</em>
(as written, it looks like the user should do this:
mapred.task.timeout=zero
which is clearly wrong)

job-configuration -> job configuration

Should the job conf section describe how job configs can be set? ie command line, programatically, config files, etc.???

record-oriented view for the -> 
record-oriented view to the

write out the output files ->
write the output files

Tasks' Side-Effect Files ->
Task Side-Effect Files

Some applications need ->
In some applications the tasks need

To avoid thes issues ->
To avoid these issues

completion of the task-attempt ->
completion of the task-attempt,

Applications specify the files, via urls (hdfs:// or http://) to be cached via the <code>JobConf</code> ->
Applications specify the files to be cached via urls (hdfs:// or http://) configured in the <code>JobConf</code>

are only copied once per job and the ability to cache archives which are un-archived on the slaves ->
are copied (and un-archived if necessary) only once per job on each slave


> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Open  (was: Patch Available)

Minor whitespace related issues.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_7_20071110.patch

Updated patch, incorporating Doug's feedback.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Patch Available  (was: Open)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540513 ] 

Nigel Daley commented on HADOOP-1917:
-------------------------------------

+1 (doc review)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_2_20071031.patch

Thanks for the review Nigel...

Here is an updated patch which incorporates Nigel's feedback and has a first-cut of {{mapred_tutorial.html}}. 

I've decided to skip the tuning bit for now. I'd like to see HADOOP-2122 go in before attempting a tuning guide... seems a bit premature for *now*.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_3_20071031.patch

Updated patch...

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540506 ] 

Hadoop QA commented on HADOOP-1917:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369010/HADOOP-1917_6_20071106.patch
against trunk revision r592324.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/console

This message is automatically generated.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Open  (was: Patch Available)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_6_20071106.patch

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Patch Available  (was: Open)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539186 ] 

Nigel Daley commented on HADOOP-1917:
-------------------------------------

Here is feedback on the first half of the mapred tutorial from HADOOP-1917_2_20071031.patch:


"serve as a Tutorial" -> "serve as a tutorial"

up-and-running -> running

parallelly -> in parallel

built of commodity -> built with commodity

which processed -> which are processed

in completely -> in a completely

The frameworks sorts -> The framework sorts

in a FileSystem -> in a filesystem

re-executes the failed ones -> re-execution of the failed ones

Normally, the -> Typically the

Hence the framework -> This configuration enables the framework to

of a master -> of a single master

per node in the cluster -> per cluster node

scheduling the jobs' -> scheduling the job's

interfaces/classes -> interfaces or abstract classes

This, and other facets -> These, and other parameter

&amp; monitoring -> and monitoring (appears in a number of places)

to the job-client etc. -> to the job client. (either remove "etc." or expand it out to list more items sent to the job client)

make Hadoop Streaming and Hadoop Pipes sentences bullet points.

I haven't compiled the forrest.  Do these type of urls work? 
api/index.html?org/apache/hadoop/streaming/package-summary.html

and/or the reducer. -> and/or the reducer function.

try to avoid <code>interface or class name</code>s (followed by an s).

The <code>key</code>s and <code>value</code>s -> The key and value classes

Additionally the <code>key</code>s -> Additionally, the key class

have to be -> have to implement (then remove trailing 's' from WritableComparable)

Input &amp; Output -> Input and Output

Lets walk through a simple Map-Reduce application before we jump into details to get a flavour for how they work. -> 
Before jumping into details, lets walk through a simple Map-Reduce example to get a flavour for how they work.

WalkThrough -> Walk-through

perhaps you should first talk about what inputs are passed to the map method.

line nos. -> lines (IMO this simplifies the reading)

line no. -> line

line# -> line

output of the each -> output of each 

(same as the -> (the combiner is the same as the

you don't introduce the concept of a combiner -- that may need more explanation (or leave it out of this tutorial)

(word) -> (or word in this example)

of the program -> method

with the given -> method with the given

interfaces/classes -> interfaces and classes (appears in a number of places in different orders)



> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542214 ] 

Hudson commented on HADOOP-1917:
--------------------------------

Integrated in Hadoop-Nightly #302 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/302/])

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541959 ] 

Doug Cutting commented on HADOOP-1917:
--------------------------------------

+1 This is great documentation to have.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Fix Version/s:     (was: 0.16.0)
                   0.15.1

Marking this for 0.15.1, might as well... since it's only a documentation patch.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542772 ] 

Hudson commented on HADOOP-1917:
--------------------------------

Integrated in Hadoop-Nightly #304 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/304/])

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Patch Available  (was: Open)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_5_20071106.patch

Another go at it... incorporated Nigel's latest comments, simplified {{WordCount}} by removing {{Tool}}-related stuff and added an advanced {{WordCount}} to demonstrate features like {{Tool}}, {{DistributedCache}}, {{Counters}} etc.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_7_20071110.patch

Forgot to grant license to ASF...

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_1_20071025.patch

Here is an early patch for some forrest-based guides to get some feedback. 

It introduces:
   *  {{quickstart.html}} - For first-time users including details on single-node setup etc.
   * {{setup.html}} - Help admins setup non-trivial hadoop clusters

Todo:
  * {{mapred-tutorial.html}} - Extensive tutorial on Map-Reduce, including a walk-through of some examples to help users understand and implement applications.
   * {{tuning.html}} - Documentation of various hdfs/mapred parameters.


> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HADOOP-1917.
---------------------------------

    Resolution: Duplicate

This issue is handled in HADOOP-1861 and HADOOP-2046

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Priority: Critical
>             Fix For: 0.15.0
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reopened HADOOP-1917:
-----------------------------------

      Assignee: Arun C Murthy

I'll resurrect this jira and use this to get track the hadoop configuration & user guides.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.0
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171 ] 

milindb edited comment on HADOOP-1917 at 10/31/07 1:10 PM:
---------------------------------------------------------------------

Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> whats the ant target ?
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> need some typical values here ?
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines
"slave processors" -> please use consistent terminology, prefer "worker" to "slave"
argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce
A picture would help in the overview.
In the Input and Output section, remove the use of combiner.
In the wordcount example, simplify it even more by avoiding the use of ToolRunner
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.
please provide a javadoc link to DistributedCache at the first mention


Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?

      was (Author: milindb):
    Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.

mapred_tutorial.html:

*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*


Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?
  
> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540107 ] 

Hadoop QA commented on HADOOP-1917:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368953/HADOOP-1917_4_20071105.patch
against trunk revision r591722.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/console

This message is automatically generated.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Open  (was: Patch Available)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540283 ] 

Hadoop QA commented on HADOOP-1917:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368996/HADOOP-1917_5_20071106.patch
against trunk revision r591880.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/console

This message is automatically generated.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538656 ] 

Nigel Daley commented on HADOOP-1917:
-------------------------------------

Looks good.  quickstart.html comments:


replace <strong> with <code> in a number of places (environment variables, commands, etc)

"framework i.e. perform" -> 

TM after Java only needs to be on the first occurrence.

"preferably from Sun." -> "preferably from Sun, must be installed."

sshd -> <code>sshd</code>

"must be installed to manage" -> "to manage"

"to use Hadoop's scripts to manage" -> "to use the Hadoop scripts that manage"

http://cvs.apache.org/dist/lucene/hadoop/nightly -> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/

"Edit the file" -> "In the unpacked release, edit the file"

"display the documentation" -> "display the usage documentation"

"This is useful for debugging, and can be demonstrated as follows:" -> "This is useful for debugging. The following example copies the unpacked <code>conf</code> directory to use as input and then finds and displays every match of the given regular expression.  Output is written to the given <code>output</code> directory.

remove "This will display counts..."

"can be completely run on a single-node in a pseudo-distributed mode:" -> "can also be run on a single-node in a pseudo-distributed mode (each Hadoop daemon runs in a separate Java process):"

"Use the following" -> "In the unpacked release, edit the file

"A new distributed filesystem must be formatted with the following command:" -> "Format a new Hadoop distributed filesystem:"

"The hadoop daemons are started with the following command:" -> "Start the hadoop daemons:"

"Input files are copied into the distributed filesystem as follows:" -> "Copy input files into the distributed filesystem as follows:"

"Output files are copied from the distributed filesystem as follows:" -> "Optionally, you can copy output files from the distributed filesystem as follows:"


> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540897 ] 

Doug Cutting commented on HADOOP-1917:
--------------------------------------

quickstart.html:

We must, for legal reasons, distinguish between developer documentation and end-user documentation.  End users should not be encouraged to use subversion or nightly releases, since software obtained that way is not distributed under the Apache license as a release is.  So it's best to keep mention of subversion and nightly releases in developer-specific documentation.

We don't document the use of 'rsync' to automatically update Hadoop software on slave nodes at startup, so perhaps we should remove mention of the dependency on rsync from the docs?  (Does anyone use this feature anymore?  If not, perhaps we should remove it.)

setup.html:

The page linked to by the "Install & Configure" menu item is titled "Hadoop Cluster - Setup".  Perhaps both should be replaced with something like "Cluster Configuration", and the file should be called "cluster-config.html"?  In general, the menu items don't need to use the term "Hadoop", while the page titles probably should.

The documentation of mapred.map.tasks and mapred.reduce.tasks here should link to the more detailed description in the mapred tutorial.

mapred_tutorial.html:

The menu entry for this should be just "Map/Reduce Tutorial", without "Hadoop".

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.1
>
>         Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.