You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by pk...@apache.org on 2005/08/08 22:23:09 UTC
svn commit: r230874 - /lucene/nutch/trunk/conf/crawl-urlfilter.txt.template
Author: pkosiorowski
Date: Mon Aug 8 13:23:03 2005
New Revision: 230874
URL: http://svn.apache.org/viewcvs?rev=230874&view=rev
Log:
Rolled back skipping of pdf files as we have a plugin to handle them.
Modified:
lucene/nutch/trunk/conf/crawl-urlfilter.txt.template
Modified: lucene/nutch/trunk/conf/crawl-urlfilter.txt.template
URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/conf/crawl-urlfilter.txt.template?rev=230874&r1=230873&r2=230874&view=diff
==============================================================================
--- lucene/nutch/trunk/conf/crawl-urlfilter.txt.template (original)
+++ lucene/nutch/trunk/conf/crawl-urlfilter.txt.template Mon Aug 8 13:23:03 2005
@@ -12,7 +12,7 @@
-^(file|ftp|mailto):
# skip image and other suffixes we can't yet parse
--\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG|pdf|PDF)$
+-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG)$
# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]