You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by eyal edri <ey...@gmail.com> on 2007/10/21 10:39:57 UTC

How to write a parse plugin and not get NullPointerException on ParseData

Hi,

I've written a parse-exe plugin for downloading EXE files from crawled
pages.
I've used the parse-pdf as my template.
Although the plugin works (d/l the exe with any content type related ,i.e.
application/(x-exe|x-msdos|x-dosexec..)), i still get nullPointerException
for parseData.
I don't fully understand the code in the end, and i might missed something,
can anyone help?

the getParse(Content content) i've written:

  public Parse getParse(Content content) {
    String resultText = "No textual content available";
    String resultTitle = "No textual content available";
    Outlink[] outlinks = new Outlink[0];
    Metadata metadata = new Metadata();

    try {

      byte[] raw = content.getContent();

      String contentLength = content.getMetadata().get(
Response.CONTENT_LENGTH);
      if (contentLength != null && raw.length !=
Integer.parseInt(contentLength))
{
          return new ParseStatus(ParseStatus.FAILED,
ParseStatus.FAILED_TRUNCATED,
                  "Content truncated at "+raw.length
            +" bytes. Parser can't handle incomplete exe
file.").getEmptyParse(getConf());
      }
      // download the file - separate method (doesn't effect the other vars)
      downloadContentType(content);

    }catch (Exception e) { // run time exception
        if (LOG.isWarnEnabled()) {
          LOG.warn("General exception in EXE parser: "+e.getMessage());
          e.printStackTrace(LogUtil.getWarnStream(LOG));
        }
          return new ParseStatus(ParseStatus.FAILED,
              "Can't be handled as exe document. " +
e).getEmptyParse(getConf());
     }

     ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
                                              resultTitle, outlinks,
                                              content.getMetadata());

    return new ParseImpl(resultText, parseData);
  }


when running i get this exception:

java.lang.NullPointerException
        at org.apache.nutch.parse.ParseData.write(ParseData.java:163)
        at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:55)
        at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java
:63)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
MapTask.java:315)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(
Fetcher.java:403)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java
:164)
fetch of http://www2.ati.com/misc/themes/ATI_ThemeManager_July2004.exefailed
with:
java.lang.NullPointerException

Thanks,

Eyal.


-- 
Eyal Edri

Re: How to write a parse plugin and not get NullPointerException on ParseData

Posted by eyal edri <ey...@gmail.com>.
Hi,

I'm interested in submitting the parse-exe (download exe) plugin to the
wiki, but i still have one small issue to solve before doing this.
Can anyone help with the NullPointerException issue?

thanks,

On 10/21/07, eyal edri <ey...@gmail.com> wrote:
>
> Hi,
>
> I've written a parse-exe plugin for downloading EXE files from crawled
> pages.
> I've used the parse-pdf as my template.
> Although the plugin works (d/l the exe with any content type related ,i.e.
> application/(x-exe|x-msdos|x-dosexec..)), i still get nullPointerException
> for parseData.
> I don't fully understand the code in the end, and i might missed
> something, can anyone help?
>
> the getParse(Content content) i've written:
>
>   public Parse getParse(Content content) {
>     String resultText = "No textual content available";
>     String resultTitle = "No textual content available";
>     Outlink[] outlinks = new Outlink[0];
>     Metadata metadata = new Metadata();
>
>     try {
>
>       byte[] raw = content.getContent();
>
>       String contentLength = content.getMetadata().get(
> Response.CONTENT_LENGTH);
>       if (contentLength != null && raw.length != Integer.parseInt(contentLength))
> {
>           return new ParseStatus(ParseStatus.FAILED ,
> ParseStatus.FAILED_TRUNCATED,
>                   "Content truncated at "+raw.length
>             +" bytes. Parser can't handle incomplete exe
> file.").getEmptyParse(getConf());
>       }
>       // download the file - separate method (doesn't effect the other
> vars)
>       downloadContentType(content);
>
>     }catch (Exception e) { // run time exception
>         if (LOG.isWarnEnabled()) {
>           LOG.warn("General exception in EXE parser: "+e.getMessage());
>           e.printStackTrace(LogUtil.getWarnStream(LOG));
>         }
>           return new ParseStatus(ParseStatus.FAILED,
>               "Can't be handled as exe document. " +
> e).getEmptyParse(getConf());
>      }
>
>      ParseData parseData = new ParseData( ParseStatus.STATUS_SUCCESS,
>                                               resultTitle, outlinks,
>                                               content.getMetadata());
>
>     return new ParseImpl(resultText, parseData);
>   }
>
>
> when running i get this exception:
>
> java.lang.NullPointerException
>         at org.apache.nutch.parse.ParseData.write(ParseData.java:163)
>         at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:55)
>         at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:63)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
> MapTask.java:315)
>         at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(
> Fetcher.java:403)
>         at org.apache.nutch.fetcher.Fetcher$FetcherThread.run (
> Fetcher.java:164)
> fetch of http://www2.ati.com/misc/themes/ATI_ThemeManager_July2004.exefailed with:
> java.lang.NullPointerException
>
> Thanks,
>
> Eyal.
>
>
> --
> Eyal Edri




-- 
Eyal Edri