You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/07/27 03:04:25 UTC
Multiple Nutch Instances
Regarding:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg08854.html
I too want to run multiple nutch instances.
I have a two CPU (with two cores each) development box on which to develop my search application. I have installed a nightly build of nutch. Currently that installation is working on a crawl that will take it many days to complete. In the meanwhile, I want to be able to try some other tests. At this stage I'm more interested in
the whole crawl cycle: inject, generate, fetch, updatedb, invertlinks, index. I'm less interested in search for now.
So for instance, I'd like to install an even more recent nightly build, then run some short crawls with it. Maybe I'd like to have another version of nutch that I hack up. I'd want to play with it even as one of the other instances is running a crawl.
My current installation is in:
/usr/local/nutch-2007-06-27_06-52-44
I've also noticed that the log file hadoop.log gets created here:
/var/tmp/nutch-2007-06-27_06-52-44
Other than these I haven't seen any environment variables or other global properties that might conflict. So it seems I could just install to
/usr/local/new_nutch
and I presume that this would be created:
/var/tmp/new_nutch
Some other discussions relating to this subject are here:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg04838.html
As for different set up for different Nutch instances I think you
could have multiple installations on your server where each instance
would have its own conf directory (with specific config files) and
source code can be shared via symbolic link.
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg02138.html
running multiple nutch on one box is possible but difficult.
The problem is that tomcat and also nutch (0.8 map reduce/ ndfs)
use a set of tcp port ports, that are already blocked in case a
other unix user already runs nutch.
The best way to go, is that you first use a subversion or cvs as
centralized repository for your customized code, than all
developers can share code and working together on the same code
basis. Beside that each developer should run a tiny test
instance of nutch on her developer machine. In the end it is a
good idea - to have a script that download once a day the code
from cvs and run a test suite and deploy the code on your 'big'
server.
http://cruisecontrol.sourceforge.net/ is a helpful tool.
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg05061.html
Q: Let's say I want to run 2 search engines on the same server.
For search engine one I use the database "crawl" and for the
second search engine I use "crawl2" as the database. For
accessing the content could I use different ports for each
engine? engine one will be localhost:8080 and engine two will
be localhost:8081. Just asking if this is possible.
A: Yes this is possible. You can use different ports or
different virtualhost or different context path to separate the
two ui's. You still need to have two separate web applications
with two separate configurations (pointing to two separate
directories)
Q: the two different web applications is really no big deal. Is
it possible that I could be pointed in the right direction or
setting this up? Someone else setup nutch/tomcat/java for me so
I am not exactly sure where I would set up the virtual host or
where a config file would exist that would point to the
database path.
A: I quess the simplest way to do it is just copy the nutch-
war-file under <TOMCAT>/webapps with two different names
(search1.war and search2.war) then after tomcat has extracted
the archives edit file <TOMCAT>/webapps/search1/WEB-
INF/classes/nutch-site.xml and change searcher.dir to point to
correct directory. For the other instance the configuration
file is <TOMCAT>/webapps/search2/WEB-INF/classes/nutch-site.xml
----- Original Message ----
From: karthik085 <ka...@gmail.com>
To: nutch-user@lucene.apache.org
Sent: Friday, July 20, 2007 3:13:24 PM
Subject: Multiple Nuch Instances
1. Can I run multiple instances of nutch for crawling/indexing? I got mixed
opinions - some say yes and some say no. Can someone, who have tried this
let me know? One guy said it is difficult becuase multiple nutch instances
have to use different ports?
2. If i can run multiple instances of nutch, can I run nutch v 0.7.2, nutch
0.9 and nutch-dev at the same time for crawling/indexing websites?
Please let me know. Thanks.
--
View this message in context: http://www.nabble.com/Multiple-Nuch-Instances-tf4119823.html#a11716837
Sent from the Nutch - User mailing list archive at Nabble.com.
____________________________________________________________________________________Ready for the edge of your seat?
Check out tonight's top picks on Yahoo! TV.
http://tv.yahoo.com/