You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hrishikesh Agashe <hr...@persistent.co.in> on 2009/08/25 09:19:21 UTC
Regarding relative paths
Hi,
It seems that Nutch is not considering URLs with relative paths (<img src = "../img/abc.jpg">) etc.
Is there any flag / patch to enable this in 1.0? If not, does anyone have idea about how this can be achieved by changing code?
--Hrishi
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: Regarding relative paths
Posted by reinhard schwab <re...@aon.at>.
there is a config option in nutch-default.xml
<property>
<name>db.ignore.internal.links</name>
<value>true</value>
<description>If true, when adding new links to a page, links from
the same host are ignored. This is an effective way to limit the
size of the link database, keeping only the highest quality
links.
</description>
</property>
overwrite it in nutch-site.xml with a false value.
Hrishikesh Agashe schrieb:
> Hi,
>
> It seems that Nutch is not considering URLs with relative paths (<img src = "../img/abc.jpg">) etc.
> Is there any flag / patch to enable this in 1.0? If not, does anyone have idea about how this can be achieved by changing code?
>
> --Hrishi
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>