You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Valmir Macário <va...@gmail.com> on 2005/09/16 15:41:59 UTC
index local system
Hi all,
I'm using solaris and try to index my local system, i follow all steps in
the FAQ but i still don't obtained success. This FAQ is missing some step or
has anything wrong? I apreciate if some one couls help me, my objective is
to index local system in a intranet. Thanks
Re: index local system
Posted by cf-auto <cf...@folge2.de>.
hi
can you tell us more about what is not working.
it would also be helpful to see your config-files.
christoph
Am Freitag, den 16.09.2005, 16:41 +0300 schrieb Valmir Macário:
> Hi all,
>
> I'm using solaris and try to index my local system, i follow all steps in
> the FAQ but i still don't obtained success. This FAQ is missing some step or
> has anything wrong? I apreciate if some one couls help me, my objective is
> to index local system in a intranet. Thanks
Re: index local system
Posted by cf-auto <cf...@folge2.de>.
Hi Valmir, Adriano
I too had some problems with crawling the local filesystem.
I wrote a small document about what I've done in order to get
things working for me.
http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
bye
c
Am Montag, den 19.09.2005, 21:19 +0300 schrieb Valmir Macário:
> Alexander, Christoph and All
>
> When i was running de crawl command was giving this error:
>
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.site.SiteQueryFilter
> 050919 092356 parsing: /files/home/vmf/nutch-0.7
> /plugins/query-url/plugin.xml
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.url.URLQueryFilter
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-regex
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-prefix
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
> at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
> at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
> at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
> Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not
> found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
> ... 4 more
>
>
> i fixed it putting the it on nutch-site.xml:
>
> <property>
> <name>plugin.includes</name>
> <value>protocol-file|protocol-http|parse-(text|html|msword|pdf)|index-basic|query-(basic|site|url)|urlfilter-regex</value>
> </property>
>
>
> my urls.txt file is : file:/export/home/vmf
>
> but is indexing everyting later de home.
>
> How i can index another account but in the intranet?
>
> I'm trying out the ip in crawl-urlfilter.txt but i don't obtained succes.
>
> Some one can give some suggestion, please.
>
> Thanks, Valmir
>
>
> On 9/16/05, Valmir Macário <va...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I'm using solaris and try to index my local system, i follow all steps in
> > the FAQ but i still don't obtained success. This FAQ is missing some step or
> > has anything wrong? I apreciate if some one couls help me, my objective is
> > to index local system in a intranet. Thanks
> >
Re: index local system
Posted by Valmir Macário <va...@gmail.com>.
Thank you very mutch, this tutorial was very useful Cristopher. I get do
what i was doing but i change a little the way that i will follow to do
this. I go transfer all archieves to servidor and do a local indexer. The
problem indexing everything still wasn't resolved. This the depth of the
search but is dificult known the number of sub-folders to index. Thank you.
Valmir
On 9/19/05, Valmir Macário <va...@gmail.com> wrote:
>
> Alexander, Christoph and All
>
> When i was running de crawl command was giving this error:
>
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.site.SiteQueryFilter
> 050919 092356 parsing: /files/home/vmf/nutch-0.7
> /plugins/query-url/plugin.xml
> 050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
> org.apache.nutch.searcher.url.URLQueryFilter
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-regex
> 050919 092356 not including: /files/home/vmf/nutch-0.7
> /plugins/urlfilter-prefix
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
> at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
> at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
> at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
> Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not
> found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
> ... 4 more
>
>
> i fixed it putting the it on nutch-site.xml:
>
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-file|protocol-http|parse-(text|html|msword|pdf)|index-basic|query-(basic|site|url)|urlfilter-regex</value>
> </property>
>
>
> my urls.txt file is : file:/export/home/vmf
>
> but is indexing everyting later de home.
>
> How i can index another account but in the intranet?
>
> I'm trying out the ip in crawl-urlfilter.txt but i don't obtained succes.
>
> Some one can give some suggestion, please.
>
> Thanks, Valmir
>
>
> On 9/16/05, Valmir Macário <va...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I'm using solaris and try to index my local system, i follow all steps
> > in the FAQ but i still don't obtained success. This FAQ is missing some step
> > or has anything wrong? I apreciate if some one couls help me, my objective
> > is to index local system in a intranet. Thanks
> >
>
>
Re: index local system
Posted by Valmir Macário <va...@gmail.com>.
Alexander, Christoph and All
When i was running de crawl command was giving this error:
050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.site.SiteQueryFilter
050919 092356 parsing: /files/home/vmf/nutch-0.7
/plugins/query-url/plugin.xml
050919 092356 impl: point=org.apache.nutch.searcher.QueryFilter class=
org.apache.nutch.searcher.url.URLQueryFilter
050919 092356 not including: /files/home/vmf/nutch-0.7
/plugins/urlfilter-regex
050919 092356 not including: /files/home/vmf/nutch-0.7
/plugins/urlfilter-prefix
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter not
found. at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
... 4 more
i fixed it putting the it on nutch-site.xml:
<property>
<name>plugin.includes</name>
<value>protocol-file|protocol-http|parse-(text|html|msword|pdf)|index-basic|query-(basic|site|url)|urlfilter-regex</value>
</property>
my urls.txt file is : file:/export/home/vmf
but is indexing everyting later de home.
How i can index another account but in the intranet?
I'm trying out the ip in crawl-urlfilter.txt but i don't obtained succes.
Some one can give some suggestion, please.
Thanks, Valmir
On 9/16/05, Valmir Macário <va...@gmail.com> wrote:
>
> Hi all,
>
> I'm using solaris and try to index my local system, i follow all steps in
> the FAQ but i still don't obtained success. This FAQ is missing some step or
> has anything wrong? I apreciate if some one couls help me, my objective is
> to index local system in a intranet. Thanks
>
Re: index local system
Posted by Alexander Genaud <al...@gmail.com>.
Valmir,
I am doing a similar thing, though not the entire harddrive, just a
large section. I'm taking a website offline, indexing it and dropping
the whole thing onto a CD. I've downloaded Jetty, placed the files I
want to index in the webapps directory and pointed Nutch at
http://localhost:8080/
Then I can place nutch.war in webapps and with a few little tweaks, I
can do local search.
Incidentally, does anyone have suggestions for pre-compiling nutch so
that the servlet container doesn't have to compile (the JSPs?) at
run-time? (I've compiled two JSPs [search.jsp] as servlets, but yet it
doesn't seem to work on machines lacking the JDK). I ask because I'd
like to distribute a CD-ROM with the JRE but not the JDK.
Cheers,
Alex
2005/9/16, Valmir Macário <va...@gmail.com>:
> Hi all,
>
> I'm using solaris and try to index my local system, i follow all steps in
> the FAQ but i still don't obtained success. This FAQ is missing some step or
> has anything wrong? I apreciate if some one couls help me, my objective is
> to index local system in a intranet. Thanks
>
>
--
http://cph.blogsome.com
CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1