You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ken Krugler <kk...@transpac.com> on 2005/07/18 02:13:19 UTC

Deploying crawl-only development version of Nutch

Hi all,

What's the best way to deploy a customized version of Nutch on a 
server, where it only crawls/indexes (no search support)?

The .war Ant build bundles up a bunch of stuff we don't need, and 
sticks things in .jsp-specific directories.

But an initial quick attempt at hacking up an Ant build to create a 
deploy folder with just the .jars we need has met with numerous 
problems, ranging from classpath-related stuff (works in the main 
.jar manifest, not on the command line) to head scratching over why 
nutch.jar includes only the nutch-default.xml & nutch-site.xml conf 
files (e.g. it doesn't have regex-urlfilter.txt, which we need), 
while nutch.war has a bigger set (including these three).

So is the best approach to just modify Nutch's .war Ant build for our 
purposes, even though we're using Eclipse to build/debug portions of 
the code?

Thanks for any advice,

-- Ken
-- 
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

Re: Deploying crawl-only development version of Nutch

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Hello Ken,
"ant tar" produces full installation of nutch - it includes also *.war 
file but you do not have to use it if you do not plan to deploy search 
frontend. But majority of other directories included would be important 
- bin for nutch shell script, conf for configuration files or  plugins 
for nutch plugins. I would use standard nutch tar file as installation 
in your case (maybe throwing away nutch*.war file if you really want to).
Ragards
Piotr

Ken Krugler wrote:
> Hi all,
> 
> What's the best way to deploy a customized version of Nutch on a server, 
> where it only crawls/indexes (no search support)?
> 
> The .war Ant build bundles up a bunch of stuff we don't need, and sticks 
> things in .jsp-specific directories.
> 
> But an initial quick attempt at hacking up an Ant build to create a 
> deploy folder with just the .jars we need has met with numerous 
> problems, ranging from classpath-related stuff (works in the main .jar 
> manifest, not on the command line) to head scratching over why nutch.jar 
> includes only the nutch-default.xml & nutch-site.xml conf files (e.g. it 
> doesn't have regex-urlfilter.txt, which we need), while nutch.war has a 
> bigger set (including these three).
> 
> So is the best approach to just modify Nutch's .war Ant build for our 
> purposes, even though we're using Eclipse to build/debug portions of the 
> code?
> 
> Thanks for any advice,
> 
> -- Ken