You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by eyal edri <ey...@gmail.com> on 2007/10/21 10:39:57 UTC
How to write a parse plugin and not get NullPointerException on ParseData
Hi,
I've written a parse-exe plugin for downloading EXE files from crawled
pages.
I've used the parse-pdf as my template.
Although the plugin works (d/l the exe with any content type related ,i.e.
application/(x-exe|x-msdos|x-dosexec..)), i still get nullPointerException
for parseData.
I don't fully understand the code in the end, and i might missed something,
can anyone help?
the getParse(Content content) i've written:
public Parse getParse(Content content) {
String resultText = "No textual content available";
String resultTitle = "No textual content available";
Outlink[] outlinks = new Outlink[0];
Metadata metadata = new Metadata();
try {
byte[] raw = content.getContent();
String contentLength = content.getMetadata().get(
Response.CONTENT_LENGTH);
if (contentLength != null && raw.length !=
Integer.parseInt(contentLength))
{
return new ParseStatus(ParseStatus.FAILED,
ParseStatus.FAILED_TRUNCATED,
"Content truncated at "+raw.length
+" bytes. Parser can't handle incomplete exe
file.").getEmptyParse(getConf());
}
// download the file - separate method (doesn't effect the other vars)
downloadContentType(content);
}catch (Exception e) { // run time exception
if (LOG.isWarnEnabled()) {
LOG.warn("General exception in EXE parser: "+e.getMessage());
e.printStackTrace(LogUtil.getWarnStream(LOG));
}
return new ParseStatus(ParseStatus.FAILED,
"Can't be handled as exe document. " +
e).getEmptyParse(getConf());
}
ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
resultTitle, outlinks,
content.getMetadata());
return new ParseImpl(resultText, parseData);
}
when running i get this exception:
java.lang.NullPointerException
at org.apache.nutch.parse.ParseData.write(ParseData.java:163)
at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:55)
at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java
:63)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
MapTask.java:315)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(
Fetcher.java:403)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java
:164)
fetch of http://www2.ati.com/misc/themes/ATI_ThemeManager_July2004.exefailed
with:
java.lang.NullPointerException
Thanks,
Eyal.
--
Eyal Edri
Re: How to write a parse plugin and not get NullPointerException on ParseData
Posted by eyal edri <ey...@gmail.com>.
Hi,
I'm interested in submitting the parse-exe (download exe) plugin to the
wiki, but i still have one small issue to solve before doing this.
Can anyone help with the NullPointerException issue?
thanks,
On 10/21/07, eyal edri <ey...@gmail.com> wrote:
>
> Hi,
>
> I've written a parse-exe plugin for downloading EXE files from crawled
> pages.
> I've used the parse-pdf as my template.
> Although the plugin works (d/l the exe with any content type related ,i.e.
> application/(x-exe|x-msdos|x-dosexec..)), i still get nullPointerException
> for parseData.
> I don't fully understand the code in the end, and i might missed
> something, can anyone help?
>
> the getParse(Content content) i've written:
>
> public Parse getParse(Content content) {
> String resultText = "No textual content available";
> String resultTitle = "No textual content available";
> Outlink[] outlinks = new Outlink[0];
> Metadata metadata = new Metadata();
>
> try {
>
> byte[] raw = content.getContent();
>
> String contentLength = content.getMetadata().get(
> Response.CONTENT_LENGTH);
> if (contentLength != null && raw.length != Integer.parseInt(contentLength))
> {
> return new ParseStatus(ParseStatus.FAILED ,
> ParseStatus.FAILED_TRUNCATED,
> "Content truncated at "+raw.length
> +" bytes. Parser can't handle incomplete exe
> file.").getEmptyParse(getConf());
> }
> // download the file - separate method (doesn't effect the other
> vars)
> downloadContentType(content);
>
> }catch (Exception e) { // run time exception
> if (LOG.isWarnEnabled()) {
> LOG.warn("General exception in EXE parser: "+e.getMessage());
> e.printStackTrace(LogUtil.getWarnStream(LOG));
> }
> return new ParseStatus(ParseStatus.FAILED,
> "Can't be handled as exe document. " +
> e).getEmptyParse(getConf());
> }
>
> ParseData parseData = new ParseData( ParseStatus.STATUS_SUCCESS,
> resultTitle, outlinks,
> content.getMetadata());
>
> return new ParseImpl(resultText, parseData);
> }
>
>
> when running i get this exception:
>
> java.lang.NullPointerException
> at org.apache.nutch.parse.ParseData.write(ParseData.java:163)
> at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:55)
> at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:63)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
> MapTask.java:315)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(
> Fetcher.java:403)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run (
> Fetcher.java:164)
> fetch of http://www2.ati.com/misc/themes/ATI_ThemeManager_July2004.exefailed with:
> java.lang.NullPointerException
>
> Thanks,
>
> Eyal.
>
>
> --
> Eyal Edri
--
Eyal Edri