You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by SebaZ <se...@gmail.com> on 2012/06/06 13:17:02 UTC

HTTP REFERER is missing

I have succesfully implemented NUTCH as crawler for SOLR index on 
http://szukaj.ug.edu.pl http://szukaj.ug.edu.pl  site. But there is some
problem with HTTP REFERER. Nutch is not sending referer header when crawling
sites. 

Is it possible to order NUTCH to send referer header on request?

Scenario:
1. Nutch open www.domain.pl
2. Nutch founds www.domain.pl/abcd.pdf link.
3. Nutch requested www.domain.pl/abcd.pdf but without
HTTP_REFERER=www.domain.pl



--
View this message in context: http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987959.html
Sent from the Nutch - Agent mailing list archive at Nabble.com.

RE: HTTP REFERER is missing

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

You are using the Nutch agent mailing list but should use the user mailing list instead which is for user questions.

Thanks

 
 
-----Original message-----
> From:SebaZ <se...@gmail.com>
> Sent: Wed 06-Jun-2012 13:18
> To: agent@nutch.apache.org
> Subject: HTTP REFERER is missing
> 
> I have succesfully implemented NUTCH as crawler for SOLR index on 
> http://szukaj.ug.edu.pl http://szukaj.ug.edu.pl  site. But there is some
> problem with HTTP REFERER. Nutch is not sending referer header when crawling
> sites. 
> 
> Is it possible to order NUTCH to send referer header on request?
> 
> Scenario:
> 1. Nutch open www.domain.pl
> 2. Nutch founds www.domain.pl/abcd.pdf link.
> 3. Nutch requested www.domain.pl/abcd.pdf but without
> HTTP_REFERER=www.domain.pl
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987959.html
> Sent from the Nutch - Agent mailing list archive at Nabble.com.
>