You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Sameer Paranjpye (JIRA)" <ji...@apache.org> on 2007/09/18 21:27:43 UTC
[jira] Created: (HADOOP-1917) Need configuration guides for Hadoop
Need configuration guides for Hadoop
------------------------------------
Key: HADOOP-1917
URL: https://issues.apache.org/jira/browse/HADOOP-1917
Project: Hadoop
Issue Type: Improvement
Components: conf
Affects Versions: 0.14.1
Reporter: Sameer Paranjpye
Priority: Critical
Fix For: 0.15.0
We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Patch Available (was: Reopened)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171 ]
Milind Bhandarkar commented on HADOOP-1917:
-------------------------------------------
Comments on HADOOP-1917
Overview.html:
"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"
should there be a step to examine web-ui for JT and NN ?
setup.html:
HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira*
Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.
mapred_tutorial.html:
*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*
Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_4_20071105.patch
Thanks to Nigel, Milind and Corrine for their extensive feedback, much appreciated!
Some comments:
@Nigel
* I've check api/index.html?<> works, I couldn't get forrest to accept any other form of urls for the javadocs (long story!). I'll gladly change if someone knows a better way. *smile*
* There is some coverage of the {{combiner}} in the {{Mapper}} section.
@Milind
* Lets keep a single tutorial, which covers all details, for now. Having one with only the example doesn't seem right. We can always revisit this later...
Ok, here is another go at it...
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538660 ]
Nigel Daley commented on HADOOP-1917:
-------------------------------------
Comments (nits) on setup.xml
replace <strong> with <code> in a number of places (environment variables, commands, etc)
"Installing a hadoop cluster is just a simple step on ensuring that the software is distributed to all the machines in the cluster." -> "Installing a Hadoop cluster typically involves unpacking the software on all the machines in the cluster."
"found in" -> "found in the"
"the hadoop scripts provided, found in the" -> "the Hadoop scripts found in the"
"the hadoop-daemons" -> "the Hadoop daemons"
"Environment of Hadoop Daemons" -> "Setting the Hadoop Daemons Environment"
"necessary configuration parameters" -> "necessary <em>configuration parameters</em>"
"Other possible knobs to tweak are:" -> "Other useful configuration parameters to customize are:"
State whether or not HADOOP_LOG_DIR already has to exist or whether it will be created if it doesn't.
Do you need to define somewhere what "Hadoop daemons" are?
"machines act as both" -> "machines act as both a"
"i.e. the" -> "and are referred to as"
"Hadoop Daemons' logs" -> "Hadoop daemons logging configuration"
"Bootup Hadoop" -> "Startup Hadoop"
"is the small matter of" -> "involves"
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Component/s: (was: conf)
documentation
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Fix Version/s: (was: 0.15.0)
0.16.0
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: (was: HADOOP-1917_7_20071110.patch)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541524 ]
Hadoop QA commented on HADOOP-1917:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12369271/HADOOP-1917_7_20071110.patch
against trunk revision r593708.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1087/console
This message is automatically generated.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540134 ]
Nigel Daley commented on HADOOP-1917:
-------------------------------------
Ok, final set of comments on the tutorial:
Application typically implement ->
Applications typically implement
These represent the core ->
These form the core
<code>Mapper</code> implementations can access the <code>JobConf</code> ... ->
<code>Mapper</code> implementations are passed the <code>JobConf</code> via the ... (discuss the ordering guarantees of the calls made to the Mapper methods: configure, map, close)
"de-initialization" -> "finalization" or "tear down" or "cleanup"
(the above 2 comments also apply to the Reducer section)
"The framework then calls" makes it sound like you were previously talking about the sequencing of calls (which I don't think you were)
"to report progress, status, counters and so on, or just indicate that they are alive" -> "to report progress, status, and counters" (it looks like that's all you can do with the Reporter interface)
(the above comment also apply to the Reducer section)
"The grouped <code>Mapper</code> outputs are partitioned per <code>Reducer</code>" (I think this concept needs more explanation as it's not obvious to the new user)
which is only a hint ->
which only provides a hint
conjunction to simulate ->
conjunction to simulate a
If equivalence rules for keys while grouping the intermediates are different from those for grouping keys before reduction ->
If equivalence rules for grouping the intermediates keys are required to be different from those for grouping keys before reduction
<em>not re-sorted</em> ->
<em>not sorted</em> by the framework
<code>zero</code> -> <em>zero</em>
is sent for reduction -> is sent to for reduction
possibly link to HashPartitioner javadoc
insignificant amount of time -> significant amount of time
even to <code>zero</code> -> even to <em>zero</em>
(as written, it looks like the user should do this:
mapred.task.timeout=zero
which is clearly wrong)
job-configuration -> job configuration
Should the job conf section describe how job configs can be set? ie command line, programatically, config files, etc.???
record-oriented view for the ->
record-oriented view to the
write out the output files ->
write the output files
Tasks' Side-Effect Files ->
Task Side-Effect Files
Some applications need ->
In some applications the tasks need
To avoid thes issues ->
To avoid these issues
completion of the task-attempt ->
completion of the task-attempt,
Applications specify the files, via urls (hdfs:// or http://) to be cached via the <code>JobConf</code> ->
Applications specify the files to be cached via urls (hdfs:// or http://) configured in the <code>JobConf</code>
are only copied once per job and the ability to cache archives which are un-archived on the slaves ->
are copied (and un-archived if necessary) only once per job on each slave
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Open (was: Patch Available)
Minor whitespace related issues.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_7_20071110.patch
Updated patch, incorporating Doug's feedback.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Patch Available (was: Open)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540513 ]
Nigel Daley commented on HADOOP-1917:
-------------------------------------
+1 (doc review)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_2_20071031.patch
Thanks for the review Nigel...
Here is an updated patch which incorporates Nigel's feedback and has a first-cut of {{mapred_tutorial.html}}.
I've decided to skip the tuning bit for now. I'd like to see HADOOP-2122 go in before attempting a tuning guide... seems a bit premature for *now*.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_3_20071031.patch
Updated patch...
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540506 ]
Hadoop QA commented on HADOOP-1917:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12369010/HADOOP-1917_6_20071106.patch
against trunk revision r592324.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests -1. The patch failed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1068/console
This message is automatically generated.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Open (was: Patch Available)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_6_20071106.patch
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Patch Available (was: Open)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539186 ]
Nigel Daley commented on HADOOP-1917:
-------------------------------------
Here is feedback on the first half of the mapred tutorial from HADOOP-1917_2_20071031.patch:
"serve as a Tutorial" -> "serve as a tutorial"
up-and-running -> running
parallelly -> in parallel
built of commodity -> built with commodity
which processed -> which are processed
in completely -> in a completely
The frameworks sorts -> The framework sorts
in a FileSystem -> in a filesystem
re-executes the failed ones -> re-execution of the failed ones
Normally, the -> Typically the
Hence the framework -> This configuration enables the framework to
of a master -> of a single master
per node in the cluster -> per cluster node
scheduling the jobs' -> scheduling the job's
interfaces/classes -> interfaces or abstract classes
This, and other facets -> These, and other parameter
& monitoring -> and monitoring (appears in a number of places)
to the job-client etc. -> to the job client. (either remove "etc." or expand it out to list more items sent to the job client)
make Hadoop Streaming and Hadoop Pipes sentences bullet points.
I haven't compiled the forrest. Do these type of urls work?
api/index.html?org/apache/hadoop/streaming/package-summary.html
and/or the reducer. -> and/or the reducer function.
try to avoid <code>interface or class name</code>s (followed by an s).
The <code>key</code>s and <code>value</code>s -> The key and value classes
Additionally the <code>key</code>s -> Additionally, the key class
have to be -> have to implement (then remove trailing 's' from WritableComparable)
Input & Output -> Input and Output
Lets walk through a simple Map-Reduce application before we jump into details to get a flavour for how they work. ->
Before jumping into details, lets walk through a simple Map-Reduce example to get a flavour for how they work.
WalkThrough -> Walk-through
perhaps you should first talk about what inputs are passed to the map method.
line nos. -> lines (IMO this simplifies the reading)
line no. -> line
line# -> line
output of the each -> output of each
(same as the -> (the combiner is the same as the
you don't introduce the concept of a combiner -- that may need more explanation (or leave it out of this tutorial)
(word) -> (or word in this example)
of the program -> method
with the given -> method with the given
interfaces/classes -> interfaces and classes (appears in a number of places in different orders)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542214 ]
Hudson commented on HADOOP-1917:
--------------------------------
Integrated in Hadoop-Nightly #302 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/302/])
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541959 ]
Doug Cutting commented on HADOOP-1917:
--------------------------------------
+1 This is great documentation to have.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Fix Version/s: (was: 0.16.0)
0.15.1
Marking this for 0.15.1, might as well... since it's only a documentation patch.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542772 ]
Hudson commented on HADOOP-1917:
--------------------------------
Integrated in Hadoop-Nightly #304 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/304/])
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Patch Available (was: Open)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_5_20071106.patch
Another go at it... incorporated Nigel's latest comments, simplified {{WordCount}} by removing {{Tool}}-related stuff and added an advanced {{WordCount}} to demonstrate features like {{Tool}}, {{DistributedCache}}, {{Counters}} etc.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_7_20071110.patch
Forgot to grant license to ASF...
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Attachment: HADOOP-1917_1_20071025.patch
Here is an early patch for some forrest-based guides to get some feedback.
It introduces:
* {{quickstart.html}} - For first-time users including details on single-node setup etc.
* {{setup.html}} - Help admins setup non-trivial hadoop clusters
Todo:
* {{mapred-tutorial.html}} - Extensive tutorial on Map-Reduce, including a walk-through of some examples to help users understand and implement applications.
* {{tuning.html}} - Documentation of various hdfs/mapred parameters.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das resolved HADOOP-1917.
---------------------------------
Resolution: Duplicate
This issue is handled in HADOOP-1861 and HADOOP-2046
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Priority: Critical
> Fix For: 0.15.0
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy reopened HADOOP-1917:
-----------------------------------
Assignee: Arun C Murthy
I'll resurrect this jira and use this to get track the hadoop configuration & user guides.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.0
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-1917) Need configuration
guides for Hadoop
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171 ]
milindb edited comment on HADOOP-1917 at 10/31/07 1:10 PM:
---------------------------------------------------------------------
Comments on HADOOP-1917
Overview.html:
"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> whats the ant target ?
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"
should there be a step to examine web-ui for JT and NN ?
setup.html:
HADOOP_HEAPSIZE -> need some typical values here ?
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines
"slave processors" -> please use consistent terminology, prefer "worker" to "slave"
argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira
Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.
mapred_tutorial.html:
consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce
A picture would help in the overview.
In the Input and Output section, remove the use of combiner.
In the wordcount example, simplify it even more by avoiding the use of ToolRunner
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.
please provide a javadoc link to DistributedCache at the first mention
Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?
was (Author: milindb):
Comments on HADOOP-1917
Overview.html:
"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"
should there be a step to examine web-ui for JT and NN ?
setup.html:
HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to "slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira*
Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases.
mapred_tutorial.html:
*consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*
Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ?
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540107 ]
Hadoop QA commented on HADOOP-1917:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12368953/HADOOP-1917_4_20071105.patch
against trunk revision r591722.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1061/console
This message is automatically generated.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Status: Open (was: Patch Available)
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540283 ]
Hadoop QA commented on HADOOP-1917:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12368996/HADOOP-1917_5_20071106.patch
against trunk revision r591880.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1065/console
This message is automatically generated.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538656 ]
Nigel Daley commented on HADOOP-1917:
-------------------------------------
Looks good. quickstart.html comments:
replace <strong> with <code> in a number of places (environment variables, commands, etc)
"framework i.e. perform" ->
TM after Java only needs to be on the first occurrence.
"preferably from Sun." -> "preferably from Sun, must be installed."
sshd -> <code>sshd</code>
"must be installed to manage" -> "to manage"
"to use Hadoop's scripts to manage" -> "to use the Hadoop scripts that manage"
http://cvs.apache.org/dist/lucene/hadoop/nightly -> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/
"Edit the file" -> "In the unpacked release, edit the file"
"display the documentation" -> "display the usage documentation"
"This is useful for debugging, and can be demonstrated as follows:" -> "This is useful for debugging. The following example copies the unpacked <code>conf</code> directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given <code>output</code> directory.
remove "This will display counts..."
"can be completely run on a single-node in a pseudo-distributed mode:" -> "can also be run on a single-node in a pseudo-distributed mode (each Hadoop daemon runs in a separate Java process):"
"Use the following" -> "In the unpacked release, edit the file
"A new distributed filesystem must be formatted with the following command:" -> "Format a new Hadoop distributed filesystem:"
"The hadoop daemons are started with the following command:" -> "Start the hadoop daemons:"
"Input files are copied into the distributed filesystem as follows:" -> "Copy input files into the distributed filesystem as follows:"
"Output files are copied from the distributed filesystem as follows:" -> "Optionally, you can copy output files from the distributed filesystem as follows:"
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1917) Need configuration guides for
Hadoop
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540897 ]
Doug Cutting commented on HADOOP-1917:
--------------------------------------
quickstart.html:
We must, for legal reasons, distinguish between developer documentation and end-user documentation. End users should not be encouraged to use subversion or nightly releases, since software obtained that way is not distributed under the Apache license as a release is. So it's best to keep mention of subversion and nightly releases in developer-specific documentation.
We don't document the use of 'rsync' to automatically update Hadoop software on slave nodes at startup, so perhaps we should remove mention of the dependency on rsync from the docs? (Does anyone use this feature anymore? If not, perhaps we should remove it.)
setup.html:
The page linked to by the "Install & Configure" menu item is titled "Hadoop Cluster - Setup". Perhaps both should be replaced with something like "Cluster Configuration", and the file should be called "cluster-config.html"? In general, the menu items don't need to use the term "Hadoop", while the page titles probably should.
The documentation of mapred.map.tasks and mapred.reduce.tasks here should link to the more detailed description in the mapred tutorial.
mapred_tutorial.html:
The menu entry for this should be just "Map/Reduce Tutorial", without "Hadoop".
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1917) Need configuration guides for Hadoop
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1917:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I just committed this.
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.1
>
> Attachments: HADOOP-1917_1_20071025.patch, HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, HADOOP-1917_4_20071105.patch, HADOOP-1917_5_20071106.patch, HADOOP-1917_6_20071106.patch, HADOOP-1917_7_20071110.patch
>
>
> We've recently had a spate of questions on the users list regarding features such as rack-awareness, the trash can etc. which are not clearly documented from a user/admins perspective. There is some Javadoc present but most of the "documentation" exists either in JIRA or in the default config files themselves.
> We should generate top down configuration and use guides for map/reduce and HDFS. These should probably be in forest and accessible from the project website (Javadoc isn't always approachable to our non-programmer audience). Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.