You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2011/08/09 16:04:27 UTC

[jira] [Commented] (NUTCH-881) Good quality documentation for Nutch

    [ https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081639#comment-13081639 ] 

Lewis John McGibbney commented on NUTCH-881:
--------------------------------------------

In Nutch trunk we currently only have the wiki as a repository for any Nutch 2.0 information. Is this satisfactory?

As far as I can tell, the documentation for Gora_trunk is produced using Apache Forrest. I am reasonably familiar with using Forrest and it would be a great benefit, as well as lessening the burden upon mailing lists, if we could maintain a clean distribution of documentation bundled nicely into a /trunk/docs or/and branch-1.4/docs directory from now on and for all future official releases.

I think the only addition to the documentation we require on the website is a formal tutorial (available as part of the Apache Nutch website), which we need to add to /site resources and which we could maintain and direct users to as a one stop resource for Nutch branch/tags, then similarly a separate resource for trunk.

ith specific reference to Nutch Trunk, in comparison on the Gora team they have provided a quick-start guide followed by a more in depth tutorial, which in our case we could apply to both branch-1.4 and 2.0 trunk. The quick-start guide would only show users how to get trunk up and running, then the formal tutorial would provide in-depth documentation on completing a crawl with either Nutch 1.4 or trunk 2.0. Does this sound reasonable?

Andrzej provided some good comments in the correspondence on NUTCH-881 which should be addressed within any comprehensive documentation. I am very happy, and pretty keen to get this issue resolved but I think we need to agree on a specific tasks which need to be addressed, basically laying the path for everything this issue encompasses.

> Good quality documentation for Nutch
> ------------------------------------
>
>                 Key: NUTCH-881
>                 URL: https://issues.apache.org/jira/browse/NUTCH-881
>             Project: Nutch
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>
> This is, and has been, a long standing request from Nutch users. This becomes an acute need as we redesign Nutch 2.0, because the collective knowledge and the Wiki will no longer be useful without massive amount of editing.
> IMHO the reference documentation should be in SVN, and not on the Wiki - the Wiki is good for casual information and recipes but I think it's too messy and not reliable enough as a reference.
> I propose to start with the following:
>  1. let's decide on the format of the docs. Each format has its own pros and cons:
>   * HTML: easy to work with, but formatting may be messy unless we edit it by hand, at which point it's no longer so easy... Good toolchains to convert to other formats, but limited expressiveness of larger structures (e.g. book, chapters, TOC, multi-column layouts, etc).
>   * Docbook: learning curve is higher, but not insurmountable... Naturally yields very good structure. Figures/diagrams may be problematic - different renderers (html, pdf) like to treat the scaling and placing somewhat differently.
>   * Wiki-style (Confluence or TWiki): easy to use, but limited control over larger structures. Maven Doxia can format cwiki, twiki, and a host of other formats to e.g. html and pdf.
>   * other?
>  2. start documenting the main tools and the main APIs (e.g. the plugins and all the extension points). We can of course reuse material from the Wiki and from various presentations (e.g. the ApacheCon slides).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira