You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by A Laxmi <a....@gmail.com> on 2013/08/01 22:56:32 UTC

fetch failed with: Http code = 403

For some reason, I am not able to crawl, the fetcher seems to have an
issue. It complains - "*fetch of http://www.someurldomain.com/ failed with:
Http code = 403, url = *
*http://www.someurldomain.com/"

*
Please help. I tried to google this issue but could not find anything that
can address this issue

Re: fetch failed with: Http code = 403

Posted by feng lu <am...@gmail.com>.
I agree with Sebastian suggestion that you can use a network traffic
analyzer to analyzer the HTTP request and response headers between nutch
and browser. Maybe they send different request headers.


On Fri, Aug 2, 2013 at 7:16 AM, A Laxmi <a....@gmail.com> wrote:

> Sebastian - thanks for your help!
>
> I can access the link from a browser without any issue. I am getting fetch
> failed with http code = 403 only while crawler is trying to fetch
>
> On Thursday, August 1, 2013, Sebastian Nagel <wa...@googlemail.com>
> wrote:
> > Hi,
> >
> > why are you sure that you didn't get a real 403 (forbidden)?
> > - the answering web server logs a delivery with 200 (ok)?
> > - a network traffic analyzer (wireshark, tcpdump) shows
> >   that HTTP response headers have a different status code?
> >
> > In general, servers may deliver different responses to a crawler
> > and a browser, or even deny to deliver a document.
> >
> > Sebastian
> >
> > On 08/01/2013 10:56 PM, A Laxmi wrote:
> >> For some reason, I am not able to crawl, the fetcher seems to have an
> >> issue. It complains - "*fetch of http://www.someurldomain.com/ failed
> with:
> >> Http code = 403, url = *
> >> *http://www.someurldomain.com/"
> >>
> >> *
> >> Please help. I tried to google this issue but could not find anything
> that
> >> can address this issue
> >>
> >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Re: fetch failed with: Http code = 403

Posted by A Laxmi <a....@gmail.com>.
Sebastian - thanks for your help!

I can access the link from a browser without any issue. I am getting fetch
failed with http code = 403 only while crawler is trying to fetch

On Thursday, August 1, 2013, Sebastian Nagel <wa...@googlemail.com>
wrote:
> Hi,
>
> why are you sure that you didn't get a real 403 (forbidden)?
> - the answering web server logs a delivery with 200 (ok)?
> - a network traffic analyzer (wireshark, tcpdump) shows
>   that HTTP response headers have a different status code?
>
> In general, servers may deliver different responses to a crawler
> and a browser, or even deny to deliver a document.
>
> Sebastian
>
> On 08/01/2013 10:56 PM, A Laxmi wrote:
>> For some reason, I am not able to crawl, the fetcher seems to have an
>> issue. It complains - "*fetch of http://www.someurldomain.com/ failed
with:
>> Http code = 403, url = *
>> *http://www.someurldomain.com/"
>>
>> *
>> Please help. I tried to google this issue but could not find anything
that
>> can address this issue
>>
>
>

Re: fetch failed with: Http code = 403

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

why are you sure that you didn't get a real 403 (forbidden)?
- the answering web server logs a delivery with 200 (ok)?
- a network traffic analyzer (wireshark, tcpdump) shows
  that HTTP response headers have a different status code?

In general, servers may deliver different responses to a crawler
and a browser, or even deny to deliver a document.

Sebastian

On 08/01/2013 10:56 PM, A Laxmi wrote:
> For some reason, I am not able to crawl, the fetcher seems to have an
> issue. It complains - "*fetch of http://www.someurldomain.com/ failed with:
> Http code = 403, url = *
> *http://www.someurldomain.com/"
> 
> *
> Please help. I tried to google this issue but could not find anything that
> can address this issue
>