You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Krishnanand, Kartik" <ka...@bankofamerica.com> on 2015/01/13 21:52:21 UTC

Parser not returning any results

Hi,

As a nutch newbie, I am trying to crawl a single URL at a depth of 1, I am seeing the following behavior

I don't know why this could be happening. I loaded the URL in browser, this did not work for me. What could be the possible reason for this behavior? Any advice would be gratefully appreciated.

2015-01-12 16:53:48,237 INFO  fetcher.Fetcher - fetching http://promo.bank.com (queue crawl delay=5000ms)
2015-01-12 16:54:57,278 INFO  parse.ParseSegment - Skipping http://promo.bank.com as content is not fetched successfully.

Thanks,

Kartik

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.

Re: Parser not returning any results

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Kartik,

I've tried the same URL and parsing worked well with Nutch 1.x (trunk).

Which Nutch version is used?

The error indicates that the fetch didn't succeed with HTTP status 200
which may happen (it could be a temporary failure).

If no failure is indicated in the logs, it's possible
to get more information via

 % bin/nutch readdb
and for 1.x also:
 % bin/nutch readseg

Best,
Sebastian


On 01/13/2015 09:52 PM, Krishnanand, Kartik wrote:
> Hi,
> 
> As a nutch newbie, I am trying to crawl a single URL at a depth of 1, I am seeing the following behavior
> 
> I don't know why this could be happening. I loaded the URL in browser, this did not work for me. What could be the possible reason for this behavior? Any advice would be gratefully appreciated.
> 
> 2015-01-12 16:53:48,237 INFO  fetcher.Fetcher - fetching http://promo.bank.com (queue crawl delay=5000ms)
> 2015-01-12 16:54:57,278 INFO  parse.ParseSegment - Skipping http://promo.bank.com as content is not fetched successfully.
> 
> Thanks,
> 
> Kartik
> 
> ----------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.
>