You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Zaihan <za...@unrealasia.net> on 2009/07/23 15:50:44 UTC

Pages with Specific URLS.

Hi All,

I'm sure I've read somewhere before that URLs that is made like
http://www.site.com/categories.asp?cid=25&page=9 

Can't be crawled. Is that true?

Warmest Regards,
Zaihan




Re: Pages with Specific URLS.

Posted by reinhard schwab <re...@aon.at>.
because?
you mean urls which contain a query part?

they can be crawled.
the default nutch configuration excludes them by this filter rule in
conf/crawl-urlfilter.txt

# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]


Zaihan schrieb:
> Hi All,
>
> I'm sure I've read somewhere before that URLs that is made like
> http://www.site.com/categories.asp?cid=25&page=9 
>
> Can't be crawled. Is that true?
>
> Warmest Regards,
> Zaihan
>
>
>
>
>