You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Ernesto De Santis <de...@yahoo.com.ar> on 2006/09/07 05:56:09 UTC

nutch can't fetch pages when the urls contain another url

Hi

I'm trying to crawl a site with links like this:

<a href="/some/http://another.site.com/">....</a>

The final url is:
http://site.one.com/xxx/http://another.site.com/
This is a concrete page, in the site.one.com site. I don't redirect to 
another.site.com.

I did try a lot of things.

I did commented all minus rules in crawl-urlfilter.txt
I did put a different rules, without success.

Someone know why nutch doesn't fetch these pages? 

A lot of thanks,
Ernesto.

	
	
		
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas