You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Valmir Macário <va...@gmail.com> on 2005/09/16 15:41:59 UTC

index local system

Hi all,

I'm using solaris and try to index my local system, i follow all steps in 
the FAQ but i still don't obtained success. This FAQ is missing some step or 
has anything wrong? I apreciate if some one couls help me, my objective is 
to index local system in a intranet. Thanks

Re: index local system

Posted by cf-auto <cf...@folge2.de>.
hi

can you tell us more about what is not working.
it would also be helpful to see your config-files.

christoph
 
Am Freitag, den 16.09.2005, 16:41 +0300 schrieb Valmir Macário:
> Hi all,
> 
> I'm using solaris and try to index my local system, i follow all steps in 
> the FAQ but i still don't obtained success. This FAQ is missing some step or 
> has anything wrong? I apreciate if some one couls help me, my objective is 
> to index local system in a intranet. Thanks


Re: index local system

Posted by cf-auto <cf...@folge2.de>.
Hi Valmir, Adriano

I too had some problems with crawling the local filesystem.
I wrote a small document about what I've done in order to get 
things working for me.

http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch

bye
c

Am Montag, den 19.09.2005, 21:19 +0300 schrieb Valmir Macário:
> Alexander, Christoph and All 
> 
> When i was running de crawl command was giving this error:
> 
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.site.SiteQueryFilter
> 050919 092356 parsing: /files/home/vmf/nutch-0.7
> /plugins/query-url/plugin.xml
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.url.URLQueryFilter
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-regex
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-prefix
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
> at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
> at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
> at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
> Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not 
> found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
> ... 4 more
> 
> 
> i fixed it putting the it on nutch-site.xml:
> 
> <property>
> <name>plugin.includes</name>
> <value>protocol-file|protocol-http|parse-(text|html|msword|pdf)|index-basic|query-(basic|site|url)|urlfilter-regex</value>
> </property>
> 
> 
> my urls.txt file is : file:/export/home/vmf
> 
> but is indexing everyting later de home.
> 
> How i can index another account but in the intranet?
> 
> I'm trying out the ip in crawl-urlfilter.txt but i don't obtained succes.
> 
> Some one can give some suggestion, please. 
> 
> Thanks, Valmir
> 
> 
> On 9/16/05, Valmir Macário <va...@gmail.com> wrote:
> > 
> > Hi all,
> > 
> > I'm using solaris and try to index my local system, i follow all steps in 
> > the FAQ but i still don't obtained success. This FAQ is missing some step or 
> > has anything wrong? I apreciate if some one couls help me, my objective is 
> > to index local system in a intranet. Thanks
> >


Re: index local system

Posted by Valmir Macário <va...@gmail.com>.
Thank you very mutch, this tutorial was very useful Cristopher. I get do
what i was doing but i change a little the way that i will follow to do
this. I go transfer all archieves to servidor and do a local indexer. The
problem indexing everything still wasn't resolved. This the depth of the
search but is dificult known the number of sub-folders to index. Thank you.

Valmir

On 9/19/05, Valmir Macário <va...@gmail.com> wrote:
>
> Alexander, Christoph and All
>
> When i was running de crawl command was giving this error:
>
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.site.SiteQueryFilter
> 050919 092356 parsing: /files/home/vmf/nutch-0.7
> /plugins/query-url/plugin.xml
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.url.URLQueryFilter
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-regex
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-prefix
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
> at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
> at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
> at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
> Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not
> found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
> ... 4 more
>
>
> i fixed it putting the it on nutch-site.xml:
>
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-file|protocol-http|parse-(text|html|msword|pdf)|index-basic|query-(basic|site|url)|urlfilter-regex</value>
> </property>
>
>
> my urls.txt file is : file:/export/home/vmf
>
> but is indexing everyting later de home.
>
> How i can index another account but in the intranet?
>
> I'm trying out the ip in crawl-urlfilter.txt but i don't obtained succes.
>
> Some one can give some suggestion, please.
>
> Thanks, Valmir
>
>
> On 9/16/05, Valmir Macário <va...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I'm using solaris and try to index my local system, i follow all steps
> > in the FAQ but i still don't obtained success. This FAQ is missing some step
> > or has anything wrong? I apreciate if some one couls help me, my objective
> > is to index local system in a intranet. Thanks
> >
>
>

Re: index local system

Posted by Valmir Macário <va...@gmail.com>.
Alexander, Christoph and All 

When i was running de crawl command was giving this error:

050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.site.SiteQueryFilter
050919 092356 parsing: /files/home/vmf/nutch-0.7
/plugins/query-url/plugin.xml
050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.url.URLQueryFilter
050919 092356 not including: /files/home/vmf/nutch-0.7
/plugins/urlfilter-regex
050919 092356 not including: /files/home/vmf/nutch-0.7
/plugins/urlfilter-prefix
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not 
found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
... 4 more


i fixed it putting the it on nutch-site.xml:

<property>
<name>plugin.includes</name>
<value>protocol-file|protocol-http|parse-(text|html|msword|pdf)|index-basic|query-(basic|site|url)|urlfilter-regex</value>
</property>


my urls.txt file is : file:/export/home/vmf

but is indexing everyting later de home.

How i can index another account but in the intranet?

I'm trying out the ip in crawl-urlfilter.txt but i don't obtained succes.

Some one can give some suggestion, please. 

Thanks, Valmir


On 9/16/05, Valmir Macário <va...@gmail.com> wrote:
> 
> Hi all,
> 
> I'm using solaris and try to index my local system, i follow all steps in 
> the FAQ but i still don't obtained success. This FAQ is missing some step or 
> has anything wrong? I apreciate if some one couls help me, my objective is 
> to index local system in a intranet. Thanks
>

Re: index local system

Posted by Alexander Genaud <al...@gmail.com>.
Valmir,

I am doing a similar thing, though not the entire harddrive, just a
large section. I'm taking a website offline, indexing it and dropping
the whole thing onto a CD. I've downloaded Jetty, placed the files I
want to index in the webapps directory and pointed Nutch at
http://localhost:8080/

Then I can place nutch.war in webapps and with a few little tweaks, I
can do local search.

Incidentally, does anyone have suggestions for pre-compiling nutch so
that the servlet container doesn't have to compile (the JSPs?) at
run-time? (I've compiled two JSPs [search.jsp] as servlets, but yet it
doesn't seem to work on machines lacking the JDK). I ask because I'd
like to distribute a CD-ROM with the JRE but not the JDK.

Cheers,
Alex

2005/9/16, Valmir Macário <va...@gmail.com>:
> Hi all,
> 
> I'm using solaris and try to index my local system, i follow all steps in
> the FAQ but i still don't obtained success. This FAQ is missing some step or
> has anything wrong? I apreciate if some one couls help me, my objective is
> to index local system in a intranet. Thanks
> 
> 


-- 
http://cph.blogsome.com

CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1