You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hrishikesh Agashe <hr...@persistent.co.in> on 2009/08/25 09:19:21 UTC

Regarding relative paths

Hi,

It seems that Nutch is not considering URLs with relative paths (<img src = "../img/abc.jpg">) etc.
Is there any flag / patch to enable this in 1.0? If not, does anyone have idea about how this can be achieved by changing code? 

--Hrishi

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Regarding relative paths

Posted by reinhard schwab <re...@aon.at>.
there is a config option in nutch-default.xml

<property>
  <name>db.ignore.internal.links</name>
  <value>true</value>
  <description>If true, when adding new links to a page, links from
  the same host are ignored.  This is an effective way to limit the
  size of the link database, keeping only the highest quality
  links.
  </description>
</property>

overwrite it in nutch-site.xml with a false value.


Hrishikesh Agashe schrieb:
> Hi,
>
> It seems that Nutch is not considering URLs with relative paths (<img src = "../img/abc.jpg">) etc.
> Is there any flag / patch to enable this in 1.0? If not, does anyone have idea about how this can be achieved by changing code? 
>
> --Hrishi
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>