You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2021/02/01 10:02:15 UTC
[nutch] branch master updated: NUTCH-2845 Complete rules of
urlfilter-suffix,
add more excluded file suffixes for - images - audio and video formats -
software packages and archives - fonts
This is an automated email from the ASF dual-hosted git repository.
snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git
The following commit(s) were added to refs/heads/master by this push:
new 2cf4d62 NUTCH-2845 Complete rules of urlfilter-suffix, add more excluded file suffixes for - images - audio and video formats - software packages and archives - fonts
new 7ffc667 Merge pull request #564 from sebastian-nagel/NUTCH-2845-urlfilter-suffix-rules
2cf4d62 is described below
commit 2cf4d62fbd239eaf70987eeaffd9963bef60c693
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Thu Nov 7 08:59:50 2019 +0100
NUTCH-2845 Complete rules of urlfilter-suffix,
add more excluded file suffixes for
- images
- audio and video formats
- software packages and archives
- fonts
---
conf/suffix-urlfilter.txt.template | 59 +++++++++++++++++++++++++++++++++++++-
1 file changed, 58 insertions(+), 1 deletion(-)
diff --git a/conf/suffix-urlfilter.txt.template b/conf/suffix-urlfilter.txt.template
index 6f02aed..e329f3c 100644
--- a/conf/suffix-urlfilter.txt.template
+++ b/conf/suffix-urlfilter.txt.template
@@ -19,13 +19,18 @@
### prohibit these
# pictures
.gif
+.gifv
.jpg
.jpeg
+.jp2
+.jpf
+.jpx
.bmp
.png
.tif
.tiff
.ico
+.icns
.eps
.ps
.wmf
@@ -38,13 +43,19 @@
.psp
.psd
.tga
+.webp
.xbm
.xpm
+.kdc
+.svg
+.svgz
# web-formats
.css
+.js
# archives/packages
+.apk
.arj
.arc
.7z
@@ -52,14 +63,25 @@
.lzw
.lha
.lzh
+.mar
.zip
.gz
.tar
.tgz
+.rar
.sit
.rpm
.deb
+.udeb
.pkg
+.bz2
+.dmg
+.lzma
+.xz
+.ipk
+.whl
+.egg
+.crx
# audio/video
.mid
@@ -68,11 +90,19 @@
.mpeg
.mpg
.mpe
+.mp4
.mp3
.mp2
.aac
.mov
+.m4a
+.m4r
+.m4v
+.mp4a
+.mpga
+.f4v
.fla
+.flac
.flv
.ra
.ram
@@ -82,14 +112,41 @@
.wmv
.wav
.wave
+.oga
.ogg
+.webm
.avi
+.avif
.au
.snd
+.3gp
+.3g2
+.qt
+.mka
+.mks
+.mkv
+.mk3d
+.opus
+.xm
+.m3u8
+.movie
+.aif
+.aiff
+.gblorb
+.xhr
-# executables
+# fonts
+.ttf
+.otf
+.pfb
+.afm
+.woff
+.woff2
+
+# executables and shared libraries
.exe
.com
+.dll
# windows links
.lnk