You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Joe Reger, Jr." <jo...@joereger.com> on 2005/05/12 15:56:59 UTC
Nutch Control via Java with no Command Line?
First of all, thanks to everybody involved in Nutch. It looks wonderful and
I can't wait to apply what you've done.
Is it possible to run and control Nutch completely within Tomcat 5.0.28 and
Java 1.4.2 using no command line?
In other words, I'd like to avoid using the command line and instead call
the java classes directly on a scheduled or user-controlled basis from
Tomcat. From what I see in bin/nutch I should be able to replace the
command:
bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
with something like:
net.nutch.tools.CrawlTool crawlTool = new net.nutch.tools.CrawlTool();
String[] args = new String[7];
args[0] = "urls";
args[1] = "-dir";
args[2] = "crawl.test";
args[3] = "-depth";
args[4] = "3";
args[5] = ">&";
args[6] = "crawl.log";
crawlTool.main(args);
Is this possible? Is this smart? What sort of issues will arrise if I try
to run everything from Tomcat/Java?
Thanks,
Joe Reger
Re: Nutch Control via Java with no Command Line?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Joe Reger, Jr. wrote:
> In other words, I'd like to avoid using the command line and instead call
> the java classes directly on a scheduled or user-controlled basis from
> Tomcat. From what I see in bin/nutch I should be able to replace the
> command:
>
> bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
>
> with something like:
>
> net.nutch.tools.CrawlTool crawlTool = new net.nutch.tools.CrawlTool();
> String[] args = new String[7];
> args[0] = "urls";
> args[1] = "-dir";
> args[2] = "crawl.test";
> args[3] = "-depth";
> args[4] = "3";
> args[5] = ">&";
> args[6] = "crawl.log";
> crawlTool.main(args);
>
> Is this possible? Is this smart? What sort of issues will arrise if I try
> to run everything from Tomcat/Java?
First of all, it's not only perfectly possible, it's actually how the
CrawlTool itself is implemented - please take a look at CrawlTool.main ...
The issues... Well, you need to keep in mind that most Nutch processing
tasks consume a lot of resources, so if you run a task in the same JVM
instance as the whole app server, then you can exhaust some resource
(file handles, heap space, cpu/io, etc) and starve other applications
that run on the same JVM.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com