You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stijn Vermeeren (JIRA)" <ji...@apache.org> on 2008/08/06 14:48:44 UTC

[jira] Updated: (NUTCH-640) confusing description "set it to Integer.MAX_VALUE"

     [ https://issues.apache.org/jira/browse/NUTCH-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stijn Vermeeren updated NUTCH-640:
----------------------------------

    Priority: Minor  (was: Major)

> confusing description "set it to Integer.MAX_VALUE"
> ---------------------------------------------------
>
>                 Key: NUTCH-640
>                 URL: https://issues.apache.org/jira/browse/NUTCH-640
>             Project: Nutch
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.9.0
>            Reporter: Stijn Vermeeren
>            Priority: Minor
>
> This property "indexer.max.tokens" has the following description in nutch-default.xml :
> " The maximum number of tokens that will be indexed for a single field
>   in a document. This limits the amount of memory required for
>   indexing, so that collections with very large files will not crash
>   the indexing process by running out of memory.
>   Note that this effectively truncates large documents, excluding
>   from the index tokens that occur further in the document. If you
>   know your source documents are large, be sure to set this value
>   high enough to accomodate the expected size. If you set it to
>   Integer.MAX_VALUE, then the only limit is your memory, but you
>   should anticipate an OutOfMemoryError."
> Apparently, "set it to Integer.MAX_VALUE" here means <<substitute the integer value of Integer.MAX_VALUE>>, and not <<put the text "Integer.MAX_VALUE" between the value tags>>. I think this is very confusing and the description should be improved.
> I first put <value>Integer.MAX_VALUE</value> in my configuration, and it took a long time to figure out what was wrong, especially since Nutch rolled back on the default value of 10000 instead of giving an error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.