You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Robin Haswell <ro...@bronco.co.uk> on 2006/12/20 12:02:49 UTC

Web interface problems

Hey there

I'm having issues searching with my newly (vastly) expanded database.
Could anyone shed any light on this? Basically, on a newly started
server, I search for "test", and this appears in catalina.out:

2006-12-20 10:51:40,710 INFO  NutchBean - creating new bean
2006-12-20 10:51:40,725 INFO  NutchBean - opening merged index in
crawl/index
2006-12-20 10:51:40,871 INFO  Configuration - found resource
common-terms.utf8 at
file:/nutch/apache-tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
2006-12-20 10:51:40,880 INFO  NutchBean - opening segments in
crawl/segments
2006-12-20 10:51:40,898 INFO  SummarizerFactory - Using the first
summarizer extension found: Basic Summarizer
2006-12-20 10:51:40,901 INFO  NutchBean - opening linkdb in crawl/linkdb
2006-12-20 10:51:40,907 INFO  NutchBean - query request from
195.166.60.2
2006-12-20 10:51:40,925 INFO  NutchBean - query: test
2006-12-20 10:51:40,925 INFO  NutchBean - lang: en
2006-12-20 10:51:40,974 INFO  NutchBean - searching for 20 raw hits
2006-12-20 10:52:13,306 ERROR [jsp] - Servlet.service() for servlet jsp
threw exception
java.lang.OutOfMemoryError: Java heap space

If I then refresh the page (which is blank by the way), I get this:

2006-12-20 10:53:23,729 INFO  NutchBean - query request from
195.166.60.2
2006-12-20 10:53:23,730 INFO  NutchBean - query: test
2006-12-20 10:53:23,730 INFO  NutchBean - lang: en
2006-12-20 10:53:23,735 INFO  NutchBean - searching for 20 raw hits
2006-12-20 10:54:04,685 ERROR [jsp] - Servlet.service() for servlet jsp
threw exception
java.lang.RuntimeException: java.lang.NoClassDefFoundError

..plus a lot of stack trace. The odd thing is though If I do this:

rob@nutchwizz:/nutch$ bin/nutch org.apache.nutch.searcher.NutchBean test
Total hits: 64106
 0 20061215102534/http://www.dyslexia-test.co.uk/
 ... About us About dyslexia Dyslexia Test 7-16 Dyslexia Test for Adults
Frequently ... results in the test ... 
 1 20061215102534/http://www.dsa.gov.uk/
[etc]

It works absolutely fine. Does anyone have any idea what might be
preventing the web interface from working properly? I have seen this
tomcat installation work with exactly the same webapp before - that is,
before I expanded the index.

rob@nutchwizz:/nutch$ bin/nutch readdb crawl/crawldb/ -stats
CrawlDb statistics start: crawl/crawldb/
Statistics for CrawlDb: crawl/crawldb/
TOTAL urls:     11502550
retry 0:        11429183
retry 1:        61224
retry 2:        10594
retry 3:        1549
min score:      0.0
avg score:      0.05785237
max score:      1309.991
status 1 (DB_unfetched):        9067758
status 2 (DB_fetched):  2221161
status 3 (DB_gone):     213631
CrawlDb statistics: done


Any help would be great

Thanks

-Rob


Re: Web interface problems

Posted by Andrzej Bialecki <ab...@getopt.org>.
Robin Haswell wrote:
> On Wed, 2006-12-20 at 12:38 +0100, Andrzej Bialecki wrote:
>   
>> This is the problem - you need to increase the heap space in your 
>> Tomcat. Since you expanded you index, the bigger index won't fit in the 
>> same heap space as before ... especially when you run searches that 
>> touch more of the index, parts of it need to be loaded into memory - so 
>> this problem may not occur for searches that return only few results.
>>     
>
> I see... so how come I can run searches with the NutchBean though?
> Anyway.. how do I increase my heap size and approximately what should I
> increase it to? My crawl directory is 41GB and this machine has only 1GB
> of main memory and 3GB of swap
>   

As soon as you start swapping the performance will become abysmal ... 
so, you need to pick a size that still can hold your index but doesn't 
cause swapping. 512M? 768M? who knows ... Please check Tomcat 
documentation on how to set it properly - I usually don't bother with 
the "proper" way and hardcode this value in catalina.sh script ... ;)

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Web interface problems

Posted by Robin Haswell <ro...@bronco.co.uk>.
On Wed, 2006-12-20 at 12:38 +0100, Andrzej Bialecki wrote:
> This is the problem - you need to increase the heap space in your 
> Tomcat. Since you expanded you index, the bigger index won't fit in the 
> same heap space as before ... especially when you run searches that 
> touch more of the index, parts of it need to be loaded into memory - so 
> this problem may not occur for searches that return only few results.

I see... so how come I can run searches with the NutchBean though?
Anyway.. how do I increase my heap size and approximately what should I
increase it to? My crawl directory is 41GB and this machine has only 1GB
of main memory and 3GB of swap

Thanks

-Rob


Re: Web interface problems

Posted by Andrzej Bialecki <ab...@getopt.org>.
Robin Haswell wrote:
> Hey there
>
> I'm having issues searching with my newly (vastly) expanded database.
> Could anyone shed any light on this? Basically, on a newly started
> server, I search for "test", and this appears in catalina.out:
>
> 2006-12-20 10:51:40,710 INFO  NutchBean - creating new bean
> 2006-12-20 10:51:40,725 INFO  NutchBean - opening merged index in
> crawl/index
> 2006-12-20 10:51:40,871 INFO  Configuration - found resource
> common-terms.utf8 at
> file:/nutch/apache-tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
> 2006-12-20 10:51:40,880 INFO  NutchBean - opening segments in
> crawl/segments
> 2006-12-20 10:51:40,898 INFO  SummarizerFactory - Using the first
> summarizer extension found: Basic Summarizer
> 2006-12-20 10:51:40,901 INFO  NutchBean - opening linkdb in crawl/linkdb
> 2006-12-20 10:51:40,907 INFO  NutchBean - query request from
> 195.166.60.2
> 2006-12-20 10:51:40,925 INFO  NutchBean - query: test
> 2006-12-20 10:51:40,925 INFO  NutchBean - lang: en
> 2006-12-20 10:51:40,974 INFO  NutchBean - searching for 20 raw hits
> 2006-12-20 10:52:13,306 ERROR [jsp] - Servlet.service() for servlet jsp
> threw exception
> java.lang.OutOfMemoryError: Java heap space
>   

This is the problem - you need to increase the heap space in your 
Tomcat. Since you expanded you index, the bigger index won't fit in the 
same heap space as before ... especially when you run searches that 
touch more of the index, parts of it need to be loaded into memory - so 
this problem may not occur for searches that return only few results.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com