You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "K.A.Hussain Ali" <Hu...@photoninfotech.com> on 2005/12/08 15:31:56 UTC
Crawling listing (pagination) pages.
HI all,
Do Nutch crawl pages in any listing pages( pages with pagination as in search engines)
While crawling through nutch i need to get the pages that gets displayed by the pagination unless i increase the depth of the whole crawling.
Do nutch provide any plugin for the above issue ?
Is there anyway to solve the above issue ?
Any help is greatly appreciated
Thanks in advance
regards
-Hussain
Re: Crawling listing (pagination) pages.
Posted by "K.A.Hussain Ali" <Hu...@photoninfotech.com>.
Hi jack,
.. the way mentioned is one way to sort out the problem
but should we check for the URL against any regularexpression during
crawling and is it possible ?
or while indexing. ?
Any helps is appreciated
Thanks in advance
regards
----- Original Message -----
From: "Jack Tang" <hi...@gmail.com>
To: <nu...@lucene.apache.org>; "K.A.Hussain Ali"
<Hu...@photoninfotech.com>
Sent: Thursday, December 08, 2005 8:05 PM
Subject: Re: Crawling listing (pagination) pages.
Hi
I am facing the same problem. However my crawl only focuses on some
website and I recognize the paganition url ursing regexp and inject
them in every fetch cycle.
/Jack
On 12/8/05, K.A.Hussain Ali <Hu...@photoninfotech.com> wrote:
> HI all,
>
> Do Nutch crawl pages in any listing pages( pages with pagination as in
> search engines)
>
> While crawling through nutch i need to get the pages that gets
> displayed by the pagination unless i increase the depth of the whole
> crawling.
> Do nutch provide any plugin for the above issue ?
> Is there anyway to solve the above issue ?
>
> Any help is greatly appreciated
> Thanks in advance
> regards
> -Hussain
>
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
Re: Crawling listing (pagination) pages.
Posted by Jack Tang <hi...@gmail.com>.
Hi
I am facing the same problem. However my crawl only focuses on some
website and I recognize the paganition url ursing regexp and inject
them in every fetch cycle.
/Jack
On 12/8/05, K.A.Hussain Ali <Hu...@photoninfotech.com> wrote:
> HI all,
>
> Do Nutch crawl pages in any listing pages( pages with pagination as in search engines)
>
> While crawling through nutch i need to get the pages that gets displayed by the pagination unless i increase the depth of the whole crawling.
> Do nutch provide any plugin for the above issue ?
> Is there anyway to solve the above issue ?
>
> Any help is greatly appreciated
> Thanks in advance
> regards
> -Hussain
>
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars