You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sami Siren (JIRA)" <ji...@apache.org> on 2006/11/18 22:59:42 UTC

[jira] Resolved: (NUTCH-388) nutch-default.xml has outdated example for urlfilter.order

     [ http://issues.apache.org/jira/browse/NUTCH-388?page=all ]

Sami Siren resolved NUTCH-388.
------------------------------

    Fix Version/s: 0.9.0
       Resolution: Fixed

This is now fixed (rev 476617). Thanks for reporting it!

> nutch-default.xml has outdated example for urlfilter.order
> ----------------------------------------------------------
>
>                 Key: NUTCH-388
>                 URL: http://issues.apache.org/jira/browse/NUTCH-388
>             Project: Nutch
>          Issue Type: Bug
>          Components: documentation, fetcher
>    Affects Versions: 0.8.1, 0.8
>            Reporter: Jared Dunne
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> The description for the nutch-default.xml entry for urlfilter.order is misleading/outdated.  In the example it refers to org.apache.nutch.net.RegexURLFilter & org.apache.nutch.net.PrefixURLFilter, when it should refer to org.apache.nutch.urlfilter.regex.RegexURLFilter & org.apache.nutch.urlfilter.prefix.PrefixURLFilter.
> <property>
>   <name>urlfilter.order</name>
>   <value></value>
>   <description>The order by which url filters are applied.
>   If empty, all available url filters (as dictated by properties
>   plugin-includes and plugin-excludes above) are loaded and applied in system
>   defined order. If not empty, only named filters are loaded and applied
>   in given order. For example, if this property has value:
>   org.apache.nutch.net.RegexURLFilter org.apache.nutch.net.PrefixURLFilter
>   then RegexURLFilter is applied first, and PrefixURLFilter second.
>   Since all filters are AND'ed, filter ordering does not have impact
>   on end result, but it may have performance implication, depending
>   on relative expensiveness of filters.
>   </description>
> </property>
> We wanted to run prefix before regex so we copied the example from the description and reversed it.  Since these package names are incorrect, it did not work and the following warnings appeared in our logs for each of the URLs in our fetchlist:
> 2006-10-17 15:55:46,533 WARN  crawl.Injector - Skipping http://bar.foo.com/:java.lang.NullPointerException                           
> 2006-10-17 15:55:46,533 WARN  crawl.Injector - Skipping http://baz.foo.com/:java.lang.NullPointerException

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira