You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ernesto De Santis <de...@yahoo.com.ar> on 2006/09/07 05:56:09 UTC
nutch can't fetch pages when the urls contain another url
Hi
I'm trying to crawl a site with links like this:
<a href="/some/http://another.site.com/">....</a>
The final url is:
http://site.one.com/xxx/http://another.site.com/
This is a concrete page, in the site.one.com site. I don't redirect to
another.site.com.
I did try a lot of things.
I did commented all minus rules in crawl-urlfilter.txt
I did put a different rules, without success.
Someone know why nutch doesn't fetch these pages?
A lot of thanks,
Ernesto.
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas