You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/07/27 03:04:25 UTC

Multiple Nutch Instances

Regarding:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg08854.html

I too want to run multiple nutch instances.  

I have a two CPU (with two cores each) development box on which to develop my search application.  I have installed a nightly build of nutch.  Currently that installation is working on a crawl that will take it many days to complete.  In the meanwhile, I want to be able to try some other tests.  At this stage I'm more interested in 
the whole crawl cycle:  inject, generate, fetch, updatedb, invertlinks, index.  I'm less interested in search for now.

So for instance, I'd like to install an even more recent nightly build, then run some short crawls with it.  Maybe I'd like to have another version of nutch that I hack up.  I'd want to play with it even as one of the other instances is running a crawl.

My current installation is in:
/usr/local/nutch-2007-06-27_06-52-44

I've also noticed that the log file hadoop.log gets created here:
/var/tmp/nutch-2007-06-27_06-52-44

Other than these I haven't seen any environment variables or other global properties that might conflict.  So it seems I could just install to

/usr/local/new_nutch

and I presume that this would be created:
/var/tmp/new_nutch

Some other discussions relating to this subject are here:

http://www.mail-archive.com/nutch-user@lucene.apache.org/msg04838.html
    As for different set up for different Nutch instances I think you
    could have multiple installations on your server where each instance
    would have its own conf directory (with specific config files) and
    source code can be shared via symbolic link.

http://www.mail-archive.com/nutch-user@lucene.apache.org/msg02138.html
    running multiple nutch on one box is possible but difficult.
    The problem is that tomcat and also nutch (0.8 map reduce/ ndfs)
    use a set of tcp port ports, that are already blocked in case a
    other unix user already runs nutch.

    The best way to go, is that you first use a subversion or cvs as
    centralized repository for your customized code, than all
    developers can share code and working together on the same code
    basis. Beside that each developer should run a tiny test
    instance of nutch on her developer machine. In the end it is a
    good idea - to have a script that download once a day the code
    from cvs and run a test suite and deploy the code on your 'big'
    server.
    http://cruisecontrol.sourceforge.net/ is a helpful tool.

http://www.mail-archive.com/nutch-user@lucene.apache.org/msg05061.html
    Q: Let's say I want to run 2 search engines on the same server.
    For search engine one I use the database "crawl" and for the
    second search engine I use "crawl2" as the database.  For
    accessing the content could I use different ports for each
    engine? engine one will be localhost:8080 and engine two will
    be localhost:8081. Just asking if this is possible.

    A: Yes this is possible. You can use different ports or
    different virtualhost or different context path to separate the
    two ui's. You still need to have two separate web applications
    with two separate configurations (pointing to two separate
    directories)

    Q: the two different web applications is really no big deal. Is
    it possible that I could be pointed in the right direction or
    setting this up? Someone else setup nutch/tomcat/java for me so
    I am not exactly sure where I would set up the virtual host or
    where a config file would exist that would point to the
    database path.

    A: I quess the simplest way to do it is just copy the nutch-
    war-file under <TOMCAT>/webapps with two different names
    (search1.war and search2.war) then after tomcat has extracted
    the archives edit file <TOMCAT>/webapps/search1/WEB-
    INF/classes/nutch-site.xml and change searcher.dir to point to
    correct directory. For the other instance the configuration
    file is <TOMCAT>/webapps/search2/WEB-INF/classes/nutch-site.xml


----- Original Message ----
From: karthik085 <ka...@gmail.com>
To: nutch-user@lucene.apache.org
Sent: Friday, July 20, 2007 3:13:24 PM
Subject: Multiple Nuch Instances


1. Can I run multiple instances of nutch for crawling/indexing? I got mixed
opinions - some say yes and some say no. Can someone, who have tried this
let me know? One guy said it is difficult becuase multiple nutch instances
have to use different ports?
2. If i can run multiple instances of nutch, can I run nutch v 0.7.2, nutch
0.9 and nutch-dev at the same time for crawling/indexing websites?

Please let me know. Thanks.


-- 
View this message in context: http://www.nabble.com/Multiple-Nuch-Instances-tf4119823.html#a11716837
Sent from the Nutch - User mailing list archive at Nabble.com.








       
____________________________________________________________________________________Ready for the edge of your seat? 
Check out tonight's top picks on Yahoo! TV. 
http://tv.yahoo.com/