You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Robin Haswell <ro...@bronco.co.uk> on 2006/12/20 12:02:49 UTC
Web interface problems
Hey there
I'm having issues searching with my newly (vastly) expanded database.
Could anyone shed any light on this? Basically, on a newly started
server, I search for "test", and this appears in catalina.out:
2006-12-20 10:51:40,710 INFO NutchBean - creating new bean
2006-12-20 10:51:40,725 INFO NutchBean - opening merged index in
crawl/index
2006-12-20 10:51:40,871 INFO Configuration - found resource
common-terms.utf8 at
file:/nutch/apache-tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
2006-12-20 10:51:40,880 INFO NutchBean - opening segments in
crawl/segments
2006-12-20 10:51:40,898 INFO SummarizerFactory - Using the first
summarizer extension found: Basic Summarizer
2006-12-20 10:51:40,901 INFO NutchBean - opening linkdb in crawl/linkdb
2006-12-20 10:51:40,907 INFO NutchBean - query request from
195.166.60.2
2006-12-20 10:51:40,925 INFO NutchBean - query: test
2006-12-20 10:51:40,925 INFO NutchBean - lang: en
2006-12-20 10:51:40,974 INFO NutchBean - searching for 20 raw hits
2006-12-20 10:52:13,306 ERROR [jsp] - Servlet.service() for servlet jsp
threw exception
java.lang.OutOfMemoryError: Java heap space
If I then refresh the page (which is blank by the way), I get this:
2006-12-20 10:53:23,729 INFO NutchBean - query request from
195.166.60.2
2006-12-20 10:53:23,730 INFO NutchBean - query: test
2006-12-20 10:53:23,730 INFO NutchBean - lang: en
2006-12-20 10:53:23,735 INFO NutchBean - searching for 20 raw hits
2006-12-20 10:54:04,685 ERROR [jsp] - Servlet.service() for servlet jsp
threw exception
java.lang.RuntimeException: java.lang.NoClassDefFoundError
..plus a lot of stack trace. The odd thing is though If I do this:
rob@nutchwizz:/nutch$ bin/nutch org.apache.nutch.searcher.NutchBean test
Total hits: 64106
0 20061215102534/http://www.dyslexia-test.co.uk/
... About us About dyslexia Dyslexia Test 7-16 Dyslexia Test for Adults
Frequently ... results in the test ...
1 20061215102534/http://www.dsa.gov.uk/
[etc]
It works absolutely fine. Does anyone have any idea what might be
preventing the web interface from working properly? I have seen this
tomcat installation work with exactly the same webapp before - that is,
before I expanded the index.
rob@nutchwizz:/nutch$ bin/nutch readdb crawl/crawldb/ -stats
CrawlDb statistics start: crawl/crawldb/
Statistics for CrawlDb: crawl/crawldb/
TOTAL urls: 11502550
retry 0: 11429183
retry 1: 61224
retry 2: 10594
retry 3: 1549
min score: 0.0
avg score: 0.05785237
max score: 1309.991
status 1 (DB_unfetched): 9067758
status 2 (DB_fetched): 2221161
status 3 (DB_gone): 213631
CrawlDb statistics: done
Any help would be great
Thanks
-Rob
Re: Web interface problems
Posted by Andrzej Bialecki <ab...@getopt.org>.
Robin Haswell wrote:
> On Wed, 2006-12-20 at 12:38 +0100, Andrzej Bialecki wrote:
>
>> This is the problem - you need to increase the heap space in your
>> Tomcat. Since you expanded you index, the bigger index won't fit in the
>> same heap space as before ... especially when you run searches that
>> touch more of the index, parts of it need to be loaded into memory - so
>> this problem may not occur for searches that return only few results.
>>
>
> I see... so how come I can run searches with the NutchBean though?
> Anyway.. how do I increase my heap size and approximately what should I
> increase it to? My crawl directory is 41GB and this machine has only 1GB
> of main memory and 3GB of swap
>
As soon as you start swapping the performance will become abysmal ...
so, you need to pick a size that still can hold your index but doesn't
cause swapping. 512M? 768M? who knows ... Please check Tomcat
documentation on how to set it properly - I usually don't bother with
the "proper" way and hardcode this value in catalina.sh script ... ;)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Web interface problems
Posted by Robin Haswell <ro...@bronco.co.uk>.
On Wed, 2006-12-20 at 12:38 +0100, Andrzej Bialecki wrote:
> This is the problem - you need to increase the heap space in your
> Tomcat. Since you expanded you index, the bigger index won't fit in the
> same heap space as before ... especially when you run searches that
> touch more of the index, parts of it need to be loaded into memory - so
> this problem may not occur for searches that return only few results.
I see... so how come I can run searches with the NutchBean though?
Anyway.. how do I increase my heap size and approximately what should I
increase it to? My crawl directory is 41GB and this machine has only 1GB
of main memory and 3GB of swap
Thanks
-Rob
Re: Web interface problems
Posted by Andrzej Bialecki <ab...@getopt.org>.
Robin Haswell wrote:
> Hey there
>
> I'm having issues searching with my newly (vastly) expanded database.
> Could anyone shed any light on this? Basically, on a newly started
> server, I search for "test", and this appears in catalina.out:
>
> 2006-12-20 10:51:40,710 INFO NutchBean - creating new bean
> 2006-12-20 10:51:40,725 INFO NutchBean - opening merged index in
> crawl/index
> 2006-12-20 10:51:40,871 INFO Configuration - found resource
> common-terms.utf8 at
> file:/nutch/apache-tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
> 2006-12-20 10:51:40,880 INFO NutchBean - opening segments in
> crawl/segments
> 2006-12-20 10:51:40,898 INFO SummarizerFactory - Using the first
> summarizer extension found: Basic Summarizer
> 2006-12-20 10:51:40,901 INFO NutchBean - opening linkdb in crawl/linkdb
> 2006-12-20 10:51:40,907 INFO NutchBean - query request from
> 195.166.60.2
> 2006-12-20 10:51:40,925 INFO NutchBean - query: test
> 2006-12-20 10:51:40,925 INFO NutchBean - lang: en
> 2006-12-20 10:51:40,974 INFO NutchBean - searching for 20 raw hits
> 2006-12-20 10:52:13,306 ERROR [jsp] - Servlet.service() for servlet jsp
> threw exception
> java.lang.OutOfMemoryError: Java heap space
>
This is the problem - you need to increase the heap space in your
Tomcat. Since you expanded you index, the bigger index won't fit in the
same heap space as before ... especially when you run searches that
touch more of the index, parts of it need to be loaded into memory - so
this problem may not occur for searches that return only few results.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com