You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by jake dsouza <ja...@gmail.com> on 2012/04/16 19:00:33 UTC

Possible Unhandled Exception in org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser

Hi All ,

I am trying to index the Trec GOV2 data set and I am getting a few
Exceptions from this class . Please see the Stack Trace Below

java.lang.NullPointerException
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser.parse(DemoHTMLParser.java:55)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecGov2Parser.parse(TrecGov2Parser.java:56)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecParserByPath.parse(TrecParserByPath.java:30)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecContentSource.getNextDocData(TrecContentSource.java:292)
Apr 16, 2012 5:32:55 AM
at com.Gov2Reader.indexDocs(Gov2Reader.java:117)

>From what I noticed , in line 56 of DemoHTMLParser we have   date =
dateFormat.parse(props.getProperty("date").trim()); but in this case ,
dateFormat = null , due to which the exception was thrown . The parse
method in TrecGov2Parser passes null to the DemoHTMLParser.parse method .

Due to this exception , some documents are missed from being indexed .

Regards
Jake