You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chirag Chaman <de...@filangy.com> on 2005/05/24 17:32:24 UTC

RE: [Nutch-general] RE: Please help: Tomcat problem, Paginatingwith optimization (Likegoggle)

Agree with Byron, you mileage may vary. For a internal site it may not make
a difference

We use both Resin and Tomcat -- we have switch back to Tomcat for the time
being as we did an upgrade of Linux and Resin/Linux has a problem setting
the problem suid/guid.

So here's why we switched:

1. Speed is more or less the same, but under heavy load Resin performs
slightly better.
2. Tomcat runs out of memory and required manual restarts every 6-8 hours,
Resin auto starts itself around once a day (again, this is a memory leak in
our code, but having the auto start is a great feature)
3. We use a load-balanced DNS, and Resin seems to work well with that model
as it allows the DNS server to test and confirm that the server is actually
down.
4. Lastly, Tomcat will sometimes die and not show you the problem -- Resin
has a good and IMO a more reliable logging mechanism which helps us
troubleshoot.

That being said, as you can see for us Resin was the choice as we needed the
reliability and better logging. For someone who is not going to change a lot
of the code Tomcat should work very well -- so use what you are most
comfortable with for know and don't get bogged down learning something new.
Once you are ready for production then you can take a look at Resin.  


Now for your other question:

If performance is of a concern then use the larger number including deletes.
>From a search perspective, deleting items does not improve the speed unless
you actually remove the deleted entries by optimizing/merging. Also, err on
the side of the larger number is performance is an issue. In this case I
would use 5MM -- and try to give the Word server 4GB ram (as that's the
maximum that can fit on a low-end server using the cost-effective 1GB
sticks)









-----Original Message-----
From: Byron Miller [mailto:Byron_Miller@compaid.com] 
Sent: Tuesday, May 24, 2005 10:10 AM
To: nutch-user@incubator.apache.org
Subject: Re: [Nutch-general] RE: Please help: Tomcat problem, Paginatingwith
optimization (Likegoggle)

The famous quite is "Your mileage may vary". There is an open source version
of resin that you can run - caucho.com.

Like i said, i've been running nutch under resin for a LONG time. Under
tomcat i had issues after issues.

-byron

-----Original Message-----
From: "yoursoft@freemail.hu" <yo...@freemail.hu>
To: user@nutch.org
Date: Tue, 24 May 2005 09:01:47 +0200
Subject: Re: [Nutch-general] RE: Please help: Tomcat problem, Paginating
with optimization (Likegoggle)

> Dear Chirag and Byron,
> 
> Thanks for suggestion, but I don't have any problem with other 
> applications under Tomcat. Problem is occured with only nutch.
> There is free version of Resin, this is truly better than Tomcat?
> 
> Dear Chirag, You wrotte that, put 1G memory / 1 million pages to the 
> backend.
> How to calculate the pages number in the segments?
> If I use the 'bin/nutch segread -list' tool this is say a segment 
> there are 500000 pages in it.
> If I use 'lukeall.jar' tool it is say there are 420105 records in that 
> segment.
> If I use 'lukeall.jar' undelete function, there are 438000 records in 
> the same segments.
> If I use websearch engine with searching for 'http', this says equal 
> to 'lukeall.jar'.
> 
> What number to use to calculate pages / backend?
> 
> Thanks, Ferenc
>