You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Boris Kroeger <bo...@cip.wiwi.uni-karlsruhe.de> on 2005/04/16 13:21:51 UTC
filename problem during local filesystem crawl
Hi,
I observed that nutch seems to have problems with filenames in a local
filesystem crawl.
One file of mine contained an exclamation mark (!) and was not processed
by the nutch crawl.
After I removed it nutch was able to process it.
May be there are further characters?
Is this worth an issue in JIRA?
regards
Boris
Re: [Nutch-dev] filename problem during local filesystem crawl
Posted by Kragen Sitaker <ks...@commerce.net>.
On Sat, 2005-04-16 at 13:21 +0200, Boris Kroeger wrote:
> One file of mine contained an exclamation mark (!) and was not processed
> by the nutch crawl.
> After I removed it nutch was able to process it.
> May be there are further characters?
>
> Is this worth an issue in JIRA?
See crawl-urlfilter.txt and regex-urlfilter.txt:
# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]
(another minor but important difference between filesystem indexing and
web indexing)