You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by SebaZ <se...@gmail.com> on 2012/06/06 13:17:02 UTC
HTTP REFERER is missing
I have succesfully implemented NUTCH as crawler for SOLR index on
http://szukaj.ug.edu.pl http://szukaj.ug.edu.pl site. But there is some
problem with HTTP REFERER. Nutch is not sending referer header when crawling
sites.
Is it possible to order NUTCH to send referer header on request?
Scenario:
1. Nutch open www.domain.pl
2. Nutch founds www.domain.pl/abcd.pdf link.
3. Nutch requested www.domain.pl/abcd.pdf but without
HTTP_REFERER=www.domain.pl
--
View this message in context: http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987959.html
Sent from the Nutch - Agent mailing list archive at Nabble.com.
RE: HTTP REFERER is missing
Posted by Markus Jelsma <ma...@openindex.io>.
Hi,
You are using the Nutch agent mailing list but should use the user mailing list instead which is for user questions.
Thanks
-----Original message-----
> From:SebaZ <se...@gmail.com>
> Sent: Wed 06-Jun-2012 13:18
> To: agent@nutch.apache.org
> Subject: HTTP REFERER is missing
>
> I have succesfully implemented NUTCH as crawler for SOLR index on
> http://szukaj.ug.edu.pl http://szukaj.ug.edu.pl site. But there is some
> problem with HTTP REFERER. Nutch is not sending referer header when crawling
> sites.
>
> Is it possible to order NUTCH to send referer header on request?
>
> Scenario:
> 1. Nutch open www.domain.pl
> 2. Nutch founds www.domain.pl/abcd.pdf link.
> 3. Nutch requested www.domain.pl/abcd.pdf but without
> HTTP_REFERER=www.domain.pl
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987959.html
> Sent from the Nutch - Agent mailing list archive at Nabble.com.
>