You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Zaihan <za...@unrealasia.net> on 2009/07/23 15:50:44 UTC
Pages with Specific URLS.
Hi All,
I'm sure I've read somewhere before that URLs that is made like
http://www.site.com/categories.asp?cid=25&page=9
Can't be crawled. Is that true?
Warmest Regards,
Zaihan
Re: Pages with Specific URLS.
Posted by reinhard schwab <re...@aon.at>.
because?
you mean urls which contain a query part?
they can be crawled.
the default nutch configuration excludes them by this filter rule in
conf/crawl-urlfilter.txt
# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]
Zaihan schrieb:
> Hi All,
>
> I'm sure I've read somewhere before that URLs that is made like
> http://www.site.com/categories.asp?cid=25&page=9
>
> Can't be crawled. Is that true?
>
> Warmest Regards,
> Zaihan
>
>
>
>
>