You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/04/01 16:41:06 UTC

[jira] [Closed] (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents

     [ https://issues.apache.org/jira/browse/NUTCH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-326.
-------------------------------

    Resolution: Won't Fix

> WordExtractor throws java.util.NoSuchElementException on some documents
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-326
>                 URL: https://issues.apache.org/jira/browse/NUTCH-326
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 0.7.1, 0.7.2
>            Reporter: Tom Jensen
>            Priority: Minor
>
> At line 156 in org.apache.nutch.parse.msword.WordExtractor it will on occassion throw a java.util.NoSuchElementException because there is no checking as to whether or not the Iterator has been exhausted.  Suggest adding this:
>         if (!textIt.hasNext()) {
>         	break;
>         }
> just before line 156.  Tested with problem word documents.  Results were Exceptions no longer being thrown and text extracted successfully.  Other documents that successfully had their text extracted previously continued to do so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira