You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Sandy Polanski <sa...@yahoo.com> on 2006/09/01 03:44:37 UTC

Re: indexing folders with nutch

Cam, try increasing the depth and see what happens. 
It seems that logic would say that they're on the same
directory depth/level; however, just give it a try
because I ran into a similar problem, and if I'm not
mistaken, that fixed it.

--- Cam Bazz <ca...@gmail.com> wrote:

> Hello,
> 
> I have a problem. I tried to index some localfiles
> with nutch.
> 
> What I have done is put them in a local apache
> server, (html files)
> and create a urls file that contains
> http://localhost/file01.html etc.
> 
> then I do a nutch crawl urls . -dir crawl -depth 1
> 
> but the crawl stales after a while, and nothing
> happens.
> 
> I also tried -topN 10000
> 
> is not there a more convinient way of indexing from
> file system?
> 
> Best regards,
> -C.B.
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: indexing folders with nutch

Posted by Lourival Júnior <ju...@gmail.com>.

Yes Cam, if you use a depth 1 you will crawl only the first document. With a
depth 2 you will crawl the first document and all the links found on this
document. With depth 3, you will crawl the first one, its links and all
links found in cycle 2. And so on. Increasing you depth will increasing your
WebDB too. Try it ;)

Regards

On 8/31/06, Sandy Polanski <sa...@yahoo.com> wrote:
>
> Cam, try increasing the depth and see what happens.
> It seems that logic would say that they're on the same
> directory depth/level; however, just give it a try
> because I ran into a similar problem, and if I'm not
> mistaken, that fixed it.
>
> --- Cam Bazz <ca...@gmail.com> wrote:
>
> > Hello,
> >
> > I have a problem. I tried to index some localfiles
> > with nutch.
> >
> > What I have done is put them in a local apache
> > server, (html files)
> > and create a urls file that contains
> > http://localhost/file01.html etc.
> >
> > then I do a nutch crawl urls . -dir crawl -depth 1
> >
> > but the crawl stales after a while, and nothing
> > happens.
> >
> > I also tried -topN 10000
> >
> > is not there a more convinient way of indexing from
> > file system?
> >
> > Best regards,
> > -C.B.
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



-- 
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: junior_ufpa@hotmail.com