You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/04/02 22:55:08 UTC
Re: dev Digest 2 Apr 2013 18:42:33 -0000 Issue 1587
Hi Binoy,
On Tue, Apr 2, 2013 at 11:42 AM, <de...@nutch.apache.org> wrote:
>
> Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl
> with One Seed.
> 22979 by: Binoy d
>
> Hi Lewis,
> I understand the head branch can be unstable some of the time. I was
> trying to point out that I was not able to reproduce the issue with HEAD
> for 2.x . I will try and create the jira after I am back from office. I
> try to not the create jiras without conforming the issue, they just tend to
> add noise. I haven't used the crawl scripts much so it might take some time
> for me to get logs from there .
>
Anything you can do to help us better understand the source of the issue is
greatly appreciated Binoy. Thank you for your perseverance (and others who
are helping on these issues) it is of real value to the Nutch community.
Best
Lewis
Re: dev Digest 2 Apr 2013 18:42:33 -0000 Issue 1587
Posted by kaveh minooie <ka...@plutoz.com>.
Hi
so I am not sure if binoy is talking about this but here it is:
the original exception comes from
src/java/org/apache/nutch/indexer/IndexUtil.java line 66
public NutchDocument index(String key, WebPage page) {
NutchDocument doc = new NutchDocument();
doc.add("id", key);
doc.add("digest", StringUtil.toHexString(page.getSignature().array()));
==>> doc.add("batchId", page.getBatchId().toString());
page.getBatchId() returns null for every urls. my guess is that updatedb
removes the batchID from the rows in webpage since the generate and
fetch work fine with batchId but after the updatedb ( which by the way
does not accept batchId as one of its parameter which means that it is
going over the entire webpage table everytime you run it, but that is a
different issue) solrindex can't find the batchIds
thou I am not sure, I am going over the code right after I hit the send :)
On 04/02/2013 01:55 PM, Lewis John Mcgibbney wrote:
> Hi Binoy,
>
>
> On Tue, Apr 2, 2013 at 11:42 AM, <dev-digest-help@nutch.apache.org
> <ma...@nutch.apache.org>> wrote:
>
>
> Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh
> crawl with One Seed.
> 22979 by: Binoy d
>
> Hi Lewis,
> I understand the head branch can be unstable some of the time. I was
> trying to point out that I was not able to reproduce the issue with
> HEAD for 2.x . I will try and create the jira after I am back from
> office. I try to not the create jiras without conforming the issue,
> they just tend to add noise. I haven't used the crawl scripts much
> so it might take some time for me to get logs from there .
>
>
> Anything you can do to help us better understand the source of the issue
> is greatly appreciated Binoy. Thank you for your perseverance (and
> others who are helping on these issues) it is of real value to the Nutch
> community.
> Best
> Lewis
--
Kaveh Minooie