You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Tom Jensen (JIRA)" <ji...@apache.org> on 2006/07/21 23:59:13 UTC

[jira] Created: (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents

WordExtractor throws java.util.NoSuchElementException on some documents
-----------------------------------------------------------------------

                 Key: NUTCH-326
                 URL: http://issues.apache.org/jira/browse/NUTCH-326
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 0.7.2, 0.7.1
            Reporter: Tom Jensen
            Priority: Minor


At line 156 in org.apache.nutch.parse.msword.WordExtractor it will on occassion throw a java.util.NoSuchElementException because there is no checking as to whether or not the Iterator has been exhausted.  Suggest adding this:

        if (!textIt.hasNext()) {
        	break;
        }

just before line 156.  Tested with problem word documents.  Results were Exceptions no longer being thrown and text extracted successfully.  Other documents that successfully had their text extracted previously continued to do so.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira