You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2017/12/15 16:50:58 UTC

[nutch] branch 2.x updated: NUTCH-2035 urlfilter-regex case insensitive rules

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/nutch.git


The following commit(s) were added to refs/heads/2.x by this push:
     new ba4b2d4  NUTCH-2035 urlfilter-regex case insensitive rules
ba4b2d4 is described below

commit ba4b2d495feb9351f7c07c767ecf9f4672cee2e3
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Fri Dec 15 17:50:39 2017 +0100

    NUTCH-2035 urlfilter-regex case insensitive rules
---
 conf/regex-urlfilter.txt.template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/conf/regex-urlfilter.txt.template b/conf/regex-urlfilter.txt.template
index 78b2b31..bcf9c87 100644
--- a/conf/regex-urlfilter.txt.template
+++ b/conf/regex-urlfilter.txt.template
@@ -27,7 +27,7 @@
 
 # skip image and other suffixes we can't yet parse
 # for a more extensive coverage use the urlfilter-suffix plugin
--\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$
+-(?i)\.(gif|jpg|png|ico|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|exe|jpeg|bmp|js)$
 
 # skip URLs containing certain characters as probable queries, etc.
 -[?*!@=]

-- 
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].