You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by pk...@apache.org on 2005/08/08 22:23:09 UTC

svn commit: r230874 - /lucene/nutch/trunk/conf/crawl-urlfilter.txt.template

Author: pkosiorowski
Date: Mon Aug  8 13:23:03 2005
New Revision: 230874

URL: http://svn.apache.org/viewcvs?rev=230874&view=rev
Log:
Rolled back skipping of pdf files as we have a plugin to handle them.

Modified:
    lucene/nutch/trunk/conf/crawl-urlfilter.txt.template

Modified: lucene/nutch/trunk/conf/crawl-urlfilter.txt.template
URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/conf/crawl-urlfilter.txt.template?rev=230874&r1=230873&r2=230874&view=diff
==============================================================================
--- lucene/nutch/trunk/conf/crawl-urlfilter.txt.template (original)
+++ lucene/nutch/trunk/conf/crawl-urlfilter.txt.template Mon Aug  8 13:23:03 2005
@@ -12,7 +12,7 @@
 -^(file|ftp|mailto):
 
 # skip image and other suffixes we can't yet parse
--\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG|pdf|PDF)$
+-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG)$
 
 # skip URLs containing certain characters as probable queries, etc.
 -[?*!@=]