You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ad...@interfree.it on 2005/09/14 10:37:24 UTC

does Nutch crawl dynamic pages???

Hi,

I have some questions:

1) There are someone that know the limitations of nutch?
2) I have a site with frames of servlet , It is possible to crawl this page?
We see also that if the frame is a html page ,nutch-crawler works, instead if the frame is a servlet ,nutch-crawler doesn't work.
Please someone respond me!!!!!!!!

                     Adriano


-------------------------------------------------------------------------
Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
ecco due esempi di offerte:

-  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
   email a soli 18,59 euro
-  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email 
   a soli 51,13 euro

Vieni a trovarci!

Lo Staff di Interfree 
-------------------------------------------------------------------------


Re: does Nutch crawl dynamic pages???

Posted by Jack Tang <hi...@gmail.com>.
Comment this line is ok
#-[?*!@=]

/Jack

On 9/14/05, mu xiaofeng <he...@gmail.com> wrote:
> yes ,
> edit you crawl-urlfilter.txt ,
> 
> You should be able to get it to work by changing this:
> 
> # skip URLs containing certain characters as probable queries, etc.
> -[?*!@=]
> 
> To this:
> 
> # skip URLs containing certain characters as probable queries, etc.
> -[*!@]
> 
> 14 Sep 2005 08:37:24 -0000, adriano50@interfree.it <ad...@interfree.it>:
> >
> > Hi,
> >
> > I have some questions:
> >
> > 1) There are someone that know the limitations of nutch?
> > 2) I have a site with frames of servlet , It is possible to crawl this page?
> > We see also that if the frame is a html page ,nutch-crawler works, instead if the frame is a servlet ,nutch-crawler doesn't work.
> > Please someone respond me!!!!!!!!
> >
> >                     Adriano
> >
> >
> > -------------------------------------------------------------------------
> > Visita http://domini.interfree.it, il sito di Interfree dove trovare
> > soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
> > ecco due esempi di offerte:
> >
> > -  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
> >   email a soli 18,59 euro
> > -  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email
> >   a soli 51,13 euro
> >
> > Vieni a trovarci!
> >
> > Lo Staff di Interfree
> > -------------------------------------------------------------------------
> >
> >
> 


-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Re: does Nutch crawl dynamic pages???

Posted by mu xiaofeng <he...@gmail.com>.
yes ,
edit you crawl-urlfilter.txt ,

You should be able to get it to work by changing this:

# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]

To this:

# skip URLs containing certain characters as probable queries, etc.
-[*!@]

14 Sep 2005 08:37:24 -0000, adriano50@interfree.it <ad...@interfree.it>:
> 
> Hi,
> 
> I have some questions:
> 
> 1) There are someone that know the limitations of nutch?
> 2) I have a site with frames of servlet , It is possible to crawl this page?
> We see also that if the frame is a html page ,nutch-crawler works, instead if the frame is a servlet ,nutch-crawler doesn't work.
> Please someone respond me!!!!!!!!
> 
>                     Adriano
> 
> 
> -------------------------------------------------------------------------
> Visita http://domini.interfree.it, il sito di Interfree dove trovare
> soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
> ecco due esempi di offerte:
> 
> -  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
>   email a soli 18,59 euro
> -  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email
>   a soli 51,13 euro
> 
> Vieni a trovarci!
> 
> Lo Staff di Interfree
> -------------------------------------------------------------------------
> 
>