You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/07/03 20:45:00 UTC

[jira] [Commented] (NUTCH-2614) NPE in CrawlDbReader

    [ https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531904#comment-16531904 ] 

Sebastian Nagel commented on NUTCH-2614:
----------------------------------------

The lines in the stack trace
{code}
513      LongWritable totalCnt = ((LongWritable) stats.get("T"));
514      stats.remove("T");
515      LOG.info("TOTAL urls:\t" + totalCnt.get());
{code}
suggest a trivial reason - an empty CrawlDb:
{noformat}
% rm -r crawldb/  # make sure to start a new CrawlDb
% bin/nutch inject crawldb/ /dev/null
% bin/nutch readdb crawldb/ -stats
...
Exception in thread "main" java.lang.NullPointerException
        at org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:555)
        at org.apache.nutch.crawl.CrawlDbReader.run(CrawlDbReader.java:914)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:980)
{noformat}
That's actually reproducible also with earlier versions (I've tried 1.14). Should be trivial to fix.

> NPE in CrawlDbReader
> --------------------
>
>                 Key: NUTCH-2614
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2614
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.14, 1.15
>            Reporter: Markus Jelsma
>            Priority: Major
>             Fix For: 1.15
>
>
> Got this in master:
> {code}
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:555)
>         at org.apache.nutch.crawl.CrawlDbReader.run(CrawlDbReader.java:914)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:980)
> {code}
> Not sure why it happens or which commit caused the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)