You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Richard Braman <rb...@taxcodesoftware.org> on 2006/09/19 01:30:28 UTC

Nutch running on FC5 - No search results yet

Thanks Andrzej, when I installed Fedora Core 5 it had an option for Java
development kit, which I incorrectly assumed was JDK.   I was able to
get JDK up and running on Fedora using JPackage for Sun Compat There are
some good instruction here for other who may need to get nutch up on
FC5. http://ccl.net/cca/software/SOURCES/JAVA/JSDK-1.5/index.shtml.  Now
nutch is somewhat working on FC5, but still a couple problems I never
ran into on windows.  One is the null meta data error on fetch, which
causes fetch to abort.  I was able to fetch no problem with this host
filtered out. This only happens on a specific host, maybe its something
breaking the HTML parser. I posted this problem on another post.  I dont
think this is unique to FC5, as I never crawled this same site on Windows.

The second error is with searcher.  Even though I was able to fetch and
index some pages, I am not getting any search results.  My tomcat log
files look good, but no results being returned.  I know I can look at
the index through LUKE GUI, but I cant get Xwindows up.  I still have an
issue open that requires recompiling kernel with FGLRX for ATI video
drivers.  Lubos recommended trying CentOS instead of Fedora, so I might
try that as the lack of xwindows can be frustrating. I am sure there is
a way to use Luke from command line, maybe I will try that to make sure
I have data in my index.

Here is my catalina log.  I am running on Tomcat 5

2006-09-18 18:37:59,678 INFO  PluginRepository -        Nutch Content
Parser (org.apache.nutch.parse.Parser)
2006-09-18 18:37:59,679 INFO  PluginRepository -        Ontology Model
Loader (org.apache.nutch.ontology.Ontology)
2006-09-18 18:37:59,679 INFO  PluginRepository -        Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2006-09-18 18:37:59,679 INFO  PluginRepository -        Nutch Query
Filter (org.apache.nutch.searcher.QueryFilter)
2006-09-18 18:37:59,758 INFO  NutchBean - creating new bean
2006-09-18 18:37:59,814 INFO  NutchBean - opening indexes in
alaskacruises/indexes
2006-09-18 18:38:00,030 INFO  Configuration - found resource
common-terms.utf8 at
file:/var/lib/tomcat5/webapps/nutch-0.9-dev/WEB-INF/classes/common-terms.utf8
2006-09-18 18:38:00,060 INFO  NutchBean - opening segments in
alaskacruises/segments
2006-09-18 18:38:00,125 INFO  SummarizerFactory - Using the first
summarizer extension found: Basic Summarizer
2006-09-18 18:38:00,125 INFO  NutchBean - opening linkdb in
alaskacruises/linkdb
2006-09-18 18:38:00,151 INFO  NutchBean - query request from 192.168.1.34
2006-09-18 18:38:00,228 INFO  NutchBean - query: alaska
2006-09-18 18:38:00,229 INFO  NutchBean - lang: en
2006-09-18 18:38:00,423 INFO  NutchBean - searching for 20 raw hits
2006-09-18 18:38:00,632 INFO  NutchBean - total hits: 0



Re: Nutch running on FC5 - No search results yet

Posted by Richard Braman <rb...@taxcodesoftware.org>.
Vishal Shah wrote:
> Hello Richard,
>
>    How big is your index? 
My index is quite small probably 1000 pages

> I have had these problems sometimes. In some
> cases, it's because there isn't enough memory to search the results.
> This can be resolved by setting JAVA_OPTS="-Xmx512m" or some other
> suitable value. Try setting this env variable or putting the export
> statement in the bin/startup.sh script for tomcat to see if it works.
>
>   
OK, this is good general knowledge, but i dont think this is my problem
>   Also, have you merged your indices by running the bin/nutch merge
> command? Make sure the output of this command goes to a dir called
> index, coz that's what the searcher looks for first. If not found, it
> will look for the indexes directory. 
I have an indexes directory.  I haven't run merge, and since I have only
run one index i didn;t need a merge.  I just do the basic generate,
fetch, update functions
followed up with an inverlinks and then an index command.  Only after I
do the next index command, at which point i have two indexes, do I do
the merge index command.  Sometimes i just delete the index and reindex all.
> Try giving this a shot too.
>
> Regards,
>
> -vishal. 
>
>
>
> -----Original Message-----
> From: Richard Braman [mailto:rbraman@taxcodesoftware.org] 
> Sent: Tuesday, September 19, 2006 5:00 AM
> To: nutch-user@lucene.apache.org
> Subject: Nutch running on FC5 - No search results yet
>
> Thanks Andrzej, when I installed Fedora Core 5 it had an option for Java
> development kit, which I incorrectly assumed was JDK.   I was able to
> get JDK up and running on Fedora using JPackage for Sun Compat There are
> some good instruction here for other who may need to get nutch up on
> FC5. http://ccl.net/cca/software/SOURCES/JAVA/JSDK-1.5/index.shtml.  Now
> nutch is somewhat working on FC5, but still a couple problems I never
> ran into on windows.  One is the null meta data error on fetch, which
> causes fetch to abort.  I was able to fetch no problem with this host
> filtered out. This only happens on a specific host, maybe its something
> breaking the HTML parser. I posted this problem on another post.  I dont
> think this is unique to FC5, as I never crawled this same site on
> Windows.
>
> The second error is with searcher.  Even though I was able to fetch and
> index some pages, I am not getting any search results.  My tomcat log
> files look good, but no results being returned.  I know I can look at
> the index through LUKE GUI, but I cant get Xwindows up.  I still have an
> issue open that requires recompiling kernel with FGLRX for ATI video
> drivers.  Lubos recommended trying CentOS instead of Fedora, so I might
> try that as the lack of xwindows can be frustrating. I am sure there is
> a way to use Luke from command line, maybe I will try that to make sure
> I have data in my index.
>
> Here is my catalina log.  I am running on Tomcat 5
>
> 2006-09-18 18:37:59,678 INFO  PluginRepository -        Nutch Content
> Parser (org.apache.nutch.parse.Parser)
> 2006-09-18 18:37:59,679 INFO  PluginRepository -        Ontology Model
> Loader (org.apache.nutch.ontology.Ontology)
> 2006-09-18 18:37:59,679 INFO  PluginRepository -        Nutch Analysis
> (org.apache.nutch.analysis.NutchAnalyzer)
> 2006-09-18 18:37:59,679 INFO  PluginRepository -        Nutch Query
> Filter (org.apache.nutch.searcher.QueryFilter)
> 2006-09-18 18:37:59,758 INFO  NutchBean - creating new bean
> 2006-09-18 18:37:59,814 INFO  NutchBean - opening indexes in
> alaskacruises/indexes
> 2006-09-18 18:38:00,030 INFO  Configuration - found resource
> common-terms.utf8 at
> file:/var/lib/tomcat5/webapps/nutch-0.9-dev/WEB-INF/classes/common-terms
> .utf8
> 2006-09-18 18:38:00,060 INFO  NutchBean - opening segments in
> alaskacruises/segments
> 2006-09-18 18:38:00,125 INFO  SummarizerFactory - Using the first
> summarizer extension found: Basic Summarizer
> 2006-09-18 18:38:00,125 INFO  NutchBean - opening linkdb in
> alaskacruises/linkdb
> 2006-09-18 18:38:00,151 INFO  NutchBean - query request from
> 192.168.1.34
> 2006-09-18 18:38:00,228 INFO  NutchBean - query: alaska
> 2006-09-18 18:38:00,229 INFO  NutchBean - lang: en
> 2006-09-18 18:38:00,423 INFO  NutchBean - searching for 20 raw hits
> 2006-09-18 18:38:00,632 INFO  NutchBean - total hits: 0
>
>
>
>
>
>   



RE: Nutch running on FC5 - No search results yet

Posted by Vishal Shah <vi...@rediff.co.in>.
Hello Richard,

   How big is your index? I have had these problems sometimes. In some
cases, it's because there isn't enough memory to search the results.
This can be resolved by setting JAVA_OPTS="-Xmx512m" or some other
suitable value. Try setting this env variable or putting the export
statement in the bin/startup.sh script for tomcat to see if it works.

  Also, have you merged your indices by running the bin/nutch merge
command? Make sure the output of this command goes to a dir called
index, coz that's what the searcher looks for first. If not found, it
will look for the indexes directory. Try giving this a shot too.

Regards,

-vishal. 



-----Original Message-----
From: Richard Braman [mailto:rbraman@taxcodesoftware.org] 
Sent: Tuesday, September 19, 2006 5:00 AM
To: nutch-user@lucene.apache.org
Subject: Nutch running on FC5 - No search results yet

Thanks Andrzej, when I installed Fedora Core 5 it had an option for Java
development kit, which I incorrectly assumed was JDK.   I was able to
get JDK up and running on Fedora using JPackage for Sun Compat There are
some good instruction here for other who may need to get nutch up on
FC5. http://ccl.net/cca/software/SOURCES/JAVA/JSDK-1.5/index.shtml.  Now
nutch is somewhat working on FC5, but still a couple problems I never
ran into on windows.  One is the null meta data error on fetch, which
causes fetch to abort.  I was able to fetch no problem with this host
filtered out. This only happens on a specific host, maybe its something
breaking the HTML parser. I posted this problem on another post.  I dont
think this is unique to FC5, as I never crawled this same site on
Windows.

The second error is with searcher.  Even though I was able to fetch and
index some pages, I am not getting any search results.  My tomcat log
files look good, but no results being returned.  I know I can look at
the index through LUKE GUI, but I cant get Xwindows up.  I still have an
issue open that requires recompiling kernel with FGLRX for ATI video
drivers.  Lubos recommended trying CentOS instead of Fedora, so I might
try that as the lack of xwindows can be frustrating. I am sure there is
a way to use Luke from command line, maybe I will try that to make sure
I have data in my index.

Here is my catalina log.  I am running on Tomcat 5

2006-09-18 18:37:59,678 INFO  PluginRepository -        Nutch Content
Parser (org.apache.nutch.parse.Parser)
2006-09-18 18:37:59,679 INFO  PluginRepository -        Ontology Model
Loader (org.apache.nutch.ontology.Ontology)
2006-09-18 18:37:59,679 INFO  PluginRepository -        Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2006-09-18 18:37:59,679 INFO  PluginRepository -        Nutch Query
Filter (org.apache.nutch.searcher.QueryFilter)
2006-09-18 18:37:59,758 INFO  NutchBean - creating new bean
2006-09-18 18:37:59,814 INFO  NutchBean - opening indexes in
alaskacruises/indexes
2006-09-18 18:38:00,030 INFO  Configuration - found resource
common-terms.utf8 at
file:/var/lib/tomcat5/webapps/nutch-0.9-dev/WEB-INF/classes/common-terms
.utf8
2006-09-18 18:38:00,060 INFO  NutchBean - opening segments in
alaskacruises/segments
2006-09-18 18:38:00,125 INFO  SummarizerFactory - Using the first
summarizer extension found: Basic Summarizer
2006-09-18 18:38:00,125 INFO  NutchBean - opening linkdb in
alaskacruises/linkdb
2006-09-18 18:38:00,151 INFO  NutchBean - query request from
192.168.1.34
2006-09-18 18:38:00,228 INFO  NutchBean - query: alaska
2006-09-18 18:38:00,229 INFO  NutchBean - lang: en
2006-09-18 18:38:00,423 INFO  NutchBean - searching for 20 raw hits
2006-09-18 18:38:00,632 INFO  NutchBean - total hits: 0