You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Trym B. Asserson" <tr...@creuna.no> on 2006/09/13 12:40:20 UTC

Question about using Nutch plug-ins as libraries

Hello,

We're currently developing an application using the Lucene API for
building a search engine and as part of the application we have a
component for parsing several file formats. For this component we were
hoping to use several of the plug-ins in Nutch and we have written
classes in our own application that build a map of Parsers that we
utilise for the various file formats we have considered applicable for
us.

Everything works fine when we run our unit tests and all of Nutch's
plug-ins are loaded successfully and we can parse all the file formats
we want to. However, we run into a problem as we deploy our application
on our app.server. We have decided to use Glassfish for our app.server
and somehow after deployment, the Nutch plug-ins cannot be configured
because it doesn't seem like the PluginManifestParser can find the
plugin folder that is bundled in our WAR, and neither can the
ParsePluginsReader find the parse-plugins.xml file. See the exceptions
below:

[#|2006-09-12T14:01:49.944+0200|INFO|sun-appserver-ee9.1|javax.enterpris
e.system.stream.out|_ThreadID=10;_ThreadName=main;|WARN -
PluginManifestParser.getPluginFolder(126) | Plugins: directory not
found: /WEB-INF/lib/plugins

[#|2006-09-12T14:01:49.959+0200|INFO|sun-appserver-ee9.1|javax.enterpris
e.system.stream.out|_ThreadID=10;_ThreadName=main;|WARN -
ParsePluginsReader.parse(115) | Unable to parse [null].Reason is
[java.net.MalformedURLException]

On Glassfish we've so far just deployed the WAR by dropping it in the
\autodeploy directory, and it gets deployed in the j2ee-modules folder
where the path to various Nutch files is as follows. I'm including
Glassfish <glassfish-domain> folder as the root of our directory
structure:

<glassfish-domain>
 - applications
 --- j2ee-modules
 ------ <app-context-root>
 --------- conf         (we've put the Nutch configuration files from
the $NUTCH_HOME\conf both in this directory and the one below, WEB-INF)
 --------- WEB-INF (we've put the Nutch configuration files from the
$NUTCH_HOME\conf both in this directory and the one above, conf)
 ------------ lib
 --------------- plugins (here are all the Nutch plug-in folders, i.e.
parse-html, parse-pdf, etc. and all dependent folders for those plug-ins
we utilise)
 - autodeploy (we drop the WAR here and it gets deployed into the
<app-context-root> folder above)
 - lib

I feel I have exhausted all combinations of putting the libs and
configuration files in various folders but the ParsePluginsReader never
seems able to find the parse-plugins.xml file.

Has anyone got some experience deploying on Glassfish, or just general
tips about how we can try to configure our application to use the
plug-ins?


Thanking you in anticipation,

Trym

--
Trym Asserson

Re: Question about using Nutch plug-ins as libraries

Posted by Dennis Kubes <nu...@dragonflymc.com>.
Is the plugins folder in the root of the war?

Dennis

Trym B. Asserson wrote:
> Hello,
>
> We're currently developing an application using the Lucene API for
> building a search engine and as part of the application we have a
> component for parsing several file formats. For this component we were
> hoping to use several of the plug-ins in Nutch and we have written
> classes in our own application that build a map of Parsers that we
> utilise for the various file formats we have considered applicable for
> us.
>
> Everything works fine when we run our unit tests and all of Nutch's
> plug-ins are loaded successfully and we can parse all the file formats
> we want to. However, we run into a problem as we deploy our application
> on our app.server. We have decided to use Glassfish for our app.server
> and somehow after deployment, the Nutch plug-ins cannot be configured
> because it doesn't seem like the PluginManifestParser can find the
> plugin folder that is bundled in our WAR, and neither can the
> ParsePluginsReader find the parse-plugins.xml file. See the exceptions
> below:
>
> [#|2006-09-12T14:01:49.944+0200|INFO|sun-appserver-ee9.1|javax.enterpris
> e.system.stream.out|_ThreadID=10;_ThreadName=main;|WARN -
> PluginManifestParser.getPluginFolder(126) | Plugins: directory not
> found: /WEB-INF/lib/plugins
>
> [#|2006-09-12T14:01:49.959+0200|INFO|sun-appserver-ee9.1|javax.enterpris
> e.system.stream.out|_ThreadID=10;_ThreadName=main;|WARN -
> ParsePluginsReader.parse(115) | Unable to parse [null].Reason is
> [java.net.MalformedURLException]
>
> On Glassfish we've so far just deployed the WAR by dropping it in the
> \autodeploy directory, and it gets deployed in the j2ee-modules folder
> where the path to various Nutch files is as follows. I'm including
> Glassfish <glassfish-domain> folder as the root of our directory
> structure:
>
> <glassfish-domain>
>  - applications
>  --- j2ee-modules
>  ------ <app-context-root>
>  --------- conf         (we've put the Nutch configuration files from
> the $NUTCH_HOME\conf both in this directory and the one below, WEB-INF)
>  --------- WEB-INF (we've put the Nutch configuration files from the
> $NUTCH_HOME\conf both in this directory and the one above, conf)
>  ------------ lib
>  --------------- plugins (here are all the Nutch plug-in folders, i.e.
> parse-html, parse-pdf, etc. and all dependent folders for those plug-ins
> we utilise)
>  - autodeploy (we drop the WAR here and it gets deployed into the
> <app-context-root> folder above)
>  - lib
>
> I feel I have exhausted all combinations of putting the libs and
> configuration files in various folders but the ParsePluginsReader never
> seems able to find the parse-plugins.xml file.
>
> Has anyone got some experience deploying on Glassfish, or just general
> tips about how we can try to configure our application to use the
> plug-ins?
>
>
> Thanking you in anticipation,
>
> Trym
>
> --
> Trym Asserson
>