You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2021/02/01 10:02:15 UTC

[nutch] branch master updated: NUTCH-2845 Complete rules of urlfilter-suffix, add more excluded file suffixes for - images - audio and video formats - software packages and archives - fonts

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git


The following commit(s) were added to refs/heads/master by this push:
     new 2cf4d62  NUTCH-2845 Complete rules of urlfilter-suffix, add more excluded file suffixes for - images - audio and video formats - software packages and archives - fonts
     new 7ffc667  Merge pull request #564 from sebastian-nagel/NUTCH-2845-urlfilter-suffix-rules
2cf4d62 is described below

commit 2cf4d62fbd239eaf70987eeaffd9963bef60c693
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Thu Nov 7 08:59:50 2019 +0100

    NUTCH-2845 Complete rules of urlfilter-suffix,
    add more excluded file suffixes for
    - images
    - audio and video formats
    - software packages and archives
    - fonts
---
 conf/suffix-urlfilter.txt.template | 59 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/conf/suffix-urlfilter.txt.template b/conf/suffix-urlfilter.txt.template
index 6f02aed..e329f3c 100644
--- a/conf/suffix-urlfilter.txt.template
+++ b/conf/suffix-urlfilter.txt.template
@@ -19,13 +19,18 @@
 ### prohibit these
 # pictures
 .gif
+.gifv
 .jpg
 .jpeg
+.jp2
+.jpf
+.jpx
 .bmp
 .png
 .tif
 .tiff
 .ico
+.icns
 .eps
 .ps
 .wmf
@@ -38,13 +43,19 @@
 .psp
 .psd
 .tga
+.webp
 .xbm
 .xpm
+.kdc
+.svg
+.svgz
 
 # web-formats
 .css
+.js
 
 # archives/packages
+.apk
 .arj
 .arc
 .7z
@@ -52,14 +63,25 @@
 .lzw
 .lha
 .lzh
+.mar
 .zip
 .gz
 .tar
 .tgz
+.rar
 .sit
 .rpm
 .deb
+.udeb
 .pkg
+.bz2
+.dmg
+.lzma
+.xz
+.ipk
+.whl
+.egg
+.crx
 
 # audio/video
 .mid
@@ -68,11 +90,19 @@
 .mpeg
 .mpg
 .mpe
+.mp4
 .mp3
 .mp2
 .aac
 .mov
+.m4a
+.m4r
+.m4v
+.mp4a
+.mpga
+.f4v
 .fla
+.flac
 .flv
 .ra
 .ram
@@ -82,14 +112,41 @@
 .wmv
 .wav
 .wave
+.oga
 .ogg
+.webm
 .avi
+.avif
 .au
 .snd
+.3gp
+.3g2
+.qt
+.mka
+.mks
+.mkv
+.mk3d
+.opus
+.xm
+.m3u8
+.movie
+.aif
+.aiff
+.gblorb
+.xhr
 
-# executables
+# fonts
+.ttf
+.otf
+.pfb
+.afm
+.woff
+.woff2
+
+# executables and shared libraries
 .exe
 .com
+.dll
 
 # windows links
 .lnk