You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by alessio crisantemi <al...@gmail.com> on 2012/03/04 17:02:17 UTC

nutch craling file system

Hi all,
I need to crawl a directory with a lot of pdf file.
But I know onlye the step-by-step mode for crawl a website.
how can I do for a root?
thank you for help me
alessio

Re: nutch craling file system

Posted by alessio crisantemi <al...@gmail.com>.
Hi again,
I follow the nutch tutoriale (point 1) and my nutch crawl the directory but
don't indexing on solr. But i don't have error on solr log and not error in
hadoop log!

But don't work..
why, in your opinion?
thank you
alessio

Il giorno 04 marzo 2012 17:06, remi tassing <ta...@gmail.com> ha
scritto:

> Plz try GOOGLing that first!
>
> If you don't find anything then try these:
> [1]http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F
> [2]
> http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
>
> [3]
>
> http://stackoverflow.com/questions/941519/how-to-make-nutch-crawl-file-system
>
>
> On Sun, Mar 4, 2012 at 5:02 PM, alessio crisantemi <
> alessio.crisantemi@gmail.com> wrote:
>
> > Hi all,
> > I need to crawl a directory with a lot of pdf file.
> > But I know onlye the step-by-step mode for crawl a website.
> > how can I do for a root?
> > thank you for help me
> > alessio
> >
>

Re: nutch craling file system

Posted by remi tassing <ta...@gmail.com>.
Why don't you try and let us know?

On Sun, Mar 4, 2012 at 6:05 PM, alessio crisantemi <
alessio.crisantemi@gmail.com> wrote:

> thank you for this fast reply!
> I use solr 1.4.1 and nutch 1.4, These solutions works with those versions?
> tx
> a.
>
> Il giorno 04 marzo 2012 17:06, remi tassing <ta...@gmail.com> ha
> scritto:
>
> > Plz try GOOGLing that first!
> >
> > If you don't find anything then try these:
> > [1]
> http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F
> > [2]
> >
> http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
> >
> > [3]
> >
> >
> http://stackoverflow.com/questions/941519/how-to-make-nutch-crawl-file-system
> >
> >
> > On Sun, Mar 4, 2012 at 5:02 PM, alessio crisantemi <
> > alessio.crisantemi@gmail.com> wrote:
> >
> > > Hi all,
> > > I need to crawl a directory with a lot of pdf file.
> > > But I know onlye the step-by-step mode for crawl a website.
> > > how can I do for a root?
> > > thank you for help me
> > > alessio
> > >
> >
>

Re: nutch craling file system

Posted by Markus Jelsma <ma...@openindex.io>.
 On Sun, 4 Mar 2012 18:05:04 +0100, alessio crisantemi 
 <al...@gmail.com> wrote:
> thank you for this fast reply!
> I use solr 1.4.1 and nutch 1.4, These solutions works with those 
> versions?
> tx

 Most likely. Nothing significant has changed and protocol-file is still 
 working fine.

> a.
>
> Il giorno 04 marzo 2012 17:06, remi tassing <ta...@gmail.com> 
> ha
> scritto:
>
>> Plz try GOOGLing that first!
>>
>> If you don't find anything then try these:
>> 
>> [1]http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F
>> [2]
>> 
>> http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
>>
>> [3]
>>
>> 
>> http://stackoverflow.com/questions/941519/how-to-make-nutch-crawl-file-system
>>
>>
>> On Sun, Mar 4, 2012 at 5:02 PM, alessio crisantemi <
>> alessio.crisantemi@gmail.com> wrote:
>>
>> > Hi all,
>> > I need to crawl a directory with a lot of pdf file.
>> > But I know onlye the step-by-step mode for crawl a website.
>> > how can I do for a root?
>> > thank you for help me
>> > alessio
>> >
>>

-- 
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350

Re: nutch craling file system

Posted by alessio crisantemi <al...@gmail.com>.
thank you for this fast reply!
I use solr 1.4.1 and nutch 1.4, These solutions works with those versions?
tx
a.

Il giorno 04 marzo 2012 17:06, remi tassing <ta...@gmail.com> ha
scritto:

> Plz try GOOGLing that first!
>
> If you don't find anything then try these:
> [1]http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F
> [2]
> http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
>
> [3]
>
> http://stackoverflow.com/questions/941519/how-to-make-nutch-crawl-file-system
>
>
> On Sun, Mar 4, 2012 at 5:02 PM, alessio crisantemi <
> alessio.crisantemi@gmail.com> wrote:
>
> > Hi all,
> > I need to crawl a directory with a lot of pdf file.
> > But I know onlye the step-by-step mode for crawl a website.
> > how can I do for a root?
> > thank you for help me
> > alessio
> >
>

Re: nutch craling file system

Posted by remi tassing <ta...@gmail.com>.
Plz try GOOGLing that first!

If you don't find anything then try these:
[1]http://wiki.apache.org/nutch/FAQ#How_do_I_index_my_local_file_system.3F
[2]http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch

[3]
http://stackoverflow.com/questions/941519/how-to-make-nutch-crawl-file-system


On Sun, Mar 4, 2012 at 5:02 PM, alessio crisantemi <
alessio.crisantemi@gmail.com> wrote:

> Hi all,
> I need to crawl a directory with a lot of pdf file.
> But I know onlye the step-by-step mode for crawl a website.
> how can I do for a root?
> thank you for help me
> alessio
>