You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Tom Jensen (JIRA)" <ji...@apache.org> on 2006/07/21 23:59:13 UTC
[jira] Created: (NUTCH-326) WordExtractor throws
java.util.NoSuchElementException on some documents
WordExtractor throws java.util.NoSuchElementException on some documents
-----------------------------------------------------------------------
Key: NUTCH-326
URL: http://issues.apache.org/jira/browse/NUTCH-326
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 0.7.2, 0.7.1
Reporter: Tom Jensen
Priority: Minor
At line 156 in org.apache.nutch.parse.msword.WordExtractor it will on occassion throw a java.util.NoSuchElementException because there is no checking as to whether or not the Iterator has been exhausted. Suggest adding this:
if (!textIt.hasNext()) {
break;
}
just before line 156. Tested with problem word documents. Results were Exceptions no longer being thrown and text extracted successfully. Other documents that successfully had their text extracted previously continued to do so.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira