You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/03/26 19:47:16 UTC

[jira] [Comment Edited] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

    [ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614418#comment-13614418 ] 

Lewis John McGibbney edited comment on NUTCH-1547 at 3/26/13 6:46 PM:
----------------------------------------------------------------------

Is this for trunk or 2.x? We can change the affects version to reflect this.
                
      was (Author: lewismc):
    Is this for trunk or 2.x?
                  
> BasicIndexingFilter - Problem to index full title
> -------------------------------------------------
>
>                 Key: NUTCH-1547
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1547
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.6
>            Reporter: Gustavo Rauber
>            Assignee: lufeng
>            Priority: Minor
>         Attachments: NUTCH-1547.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content.
> If you would like to fix it, just replace the line number 90:
> if (title.length() > MAX_TITLE_LENGTH) {      // truncate title if needed
> by this one:
> if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) {      // truncate title if needed
> Stack Trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1937)
> 	at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
> 	at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
> 	at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
> 	at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira