You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/04/02 22:55:08 UTC

Re: dev Digest 2 Apr 2013 18:42:33 -0000 Issue 1587

Hi Binoy,

On Tue, Apr 2, 2013 at 11:42 AM, <de...@nutch.apache.org> wrote:

>
> Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl
> with One Seed.
>         22979 by: Binoy d
>
> Hi Lewis,
> I understand the head branch can be unstable some of the time. I was
> trying to point out that I was not able to reproduce the issue with HEAD
> for 2.x . I will try and create the jira after I am back from office.  I
> try to not the create jiras without conforming the issue, they just tend to
> add noise. I haven't used the crawl scripts much so it might take some time
> for me to get logs from there .
>

Anything you can do to help us better understand the source of the issue is
greatly appreciated Binoy. Thank you for your perseverance (and others who
are helping on these issues) it is of real value to the Nutch community.
Best
Lewis

Re: dev Digest 2 Apr 2013 18:42:33 -0000 Issue 1587

Posted by kaveh minooie <ka...@plutoz.com>.

Hi

so I am not sure if binoy is talking about this but here it is:

the original exception comes from
src/java/org/apache/nutch/indexer/IndexUtil.java  line 66

  public NutchDocument index(String key, WebPage page) {
     NutchDocument doc = new NutchDocument();
     doc.add("id", key);
     doc.add("digest", StringUtil.toHexString(page.getSignature().array()));
==>>    doc.add("batchId", page.getBatchId().toString());

page.getBatchId() returns null for every urls. my guess is that updatedb 
removes the batchID from the rows in webpage since the generate and 
fetch work fine with batchId but after the updatedb ( which by the way 
does not accept batchId as one of its parameter which means that it is 
going over the entire webpage table everytime you run it, but that is a 
different issue) solrindex can't find the batchIds

thou I am not sure, I am going over the code right after I hit the send :)


On 04/02/2013 01:55 PM, Lewis John Mcgibbney wrote:
> Hi Binoy,
>
>
> On Tue, Apr 2, 2013 at 11:42 AM, <dev-digest-help@nutch.apache.org
> <ma...@nutch.apache.org>> wrote:
>
>
>     Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh
>     crawl with One Seed.
>              22979 by: Binoy d
>
>     Hi Lewis,
>     I understand the head branch can be unstable some of the time. I was
>     trying to point out that I was not able to reproduce the issue with
>     HEAD for 2.x . I will try and create the jira after I am back from
>     office.  I try to not the create jiras without conforming the issue,
>     they just tend to add noise. I haven't used the crawl scripts much
>     so it might take some time for me to get logs from there .
>
>
> Anything you can do to help us better understand the source of the issue
> is greatly appreciated Binoy. Thank you for your perseverance (and
> others who are helping on these issues) it is of real value to the Nutch
> community.
> Best
> Lewis

-- 
Kaveh Minooie