You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stijn Vermeeren (JIRA)" <ji...@apache.org> on 2008/08/06 14:48:44 UTC
[jira] Updated: (NUTCH-640) confusing description "set it to
Integer.MAX_VALUE"
[ https://issues.apache.org/jira/browse/NUTCH-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stijn Vermeeren updated NUTCH-640:
----------------------------------
Priority: Minor (was: Major)
> confusing description "set it to Integer.MAX_VALUE"
> ---------------------------------------------------
>
> Key: NUTCH-640
> URL: https://issues.apache.org/jira/browse/NUTCH-640
> Project: Nutch
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.9.0
> Reporter: Stijn Vermeeren
> Priority: Minor
>
> This property "indexer.max.tokens" has the following description in nutch-default.xml :
> " The maximum number of tokens that will be indexed for a single field
> in a document. This limits the amount of memory required for
> indexing, so that collections with very large files will not crash
> the indexing process by running out of memory.
> Note that this effectively truncates large documents, excluding
> from the index tokens that occur further in the document. If you
> know your source documents are large, be sure to set this value
> high enough to accomodate the expected size. If you set it to
> Integer.MAX_VALUE, then the only limit is your memory, but you
> should anticipate an OutOfMemoryError."
> Apparently, "set it to Integer.MAX_VALUE" here means <<substitute the integer value of Integer.MAX_VALUE>>, and not <<put the text "Integer.MAX_VALUE" between the value tags>>. I think this is very confusing and the description should be improved.
> I first put <value>Integer.MAX_VALUE</value> in my configuration, and it took a long time to figure out what was wrong, especially since Nutch rolled back on the default value of 10000 instead of giving an error.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.