You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edoardo Causarano <ed...@gmail.com> on 2014/09/12 12:11:29 UTC

Plugin loading and NUTCH-609

Hi all,

I'm completely lost, can anyone help me out here?

I have this job.jar which contains all Nutch code, dependencies and plugins. I don't understand how I keep getting this error:

2014-09-12 11:51:04,458 WARN [main] org.apache.nutch.plugin.PluginRepository: Plugins: not a file: url. Can't load plugins from: jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oracle/appcache/application_1410512500237_0003/filecache/10/job.jar/job.jar!/lib/plugins

Ok, I have to admit that I'm mucking around with the project structure but having found NUTCH-609 it seems that the PluginManifestParser does not support loading plugins from the job payload itself. Is this the case? If so, can anyone tell me where I need to unpack these plugins so that the loader will pick them up?

Side note: is anyone interested in overhauling this loading mechanism? The XML manifest could be replaced with an annotation class, although I would be happy enough if I could include and load it into the jar.


Best,
Edoardo

Re: Plugin loading and NUTCH-609

Posted by Edoardo Causarano <ed...@gmail.com>.
On 15 sep. 2014, at 11:36, Julien Nioche <li...@gmail.com> wrote:

Hi Julien,

see my inline replies

> Hi Edoardo,
> 
> See my comments below
> 
> On 12 September 2014 11:11, Edoardo Causarano <ed...@gmail.com>
> wrote:
> 
>> Hi all,
>> 
>> I'm completely lost, can anyone help me out here?
>> 
>> I have this job.jar which contains all Nutch code, dependencies and
>> plugins. I don't understand how I keep getting this error:
>> 
>> 2014-09-12 11:51:04,458 WARN [main]
>> org.apache.nutch.plugin.PluginRepository: Plugins: not a file: url. Can't
>> load plugins from:
>> jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oracle/appcache/application_1410512500237_0003/filecache/10/job.jar/job.jar!/lib/plugins
>> 
>> Ok, I have to admit that I'm mucking around with the project structure but
>> having found NUTCH-609 it seems that the PluginManifestParser does not
>> support loading plugins from the job payload itself. Is this the case? If
>> so, can anyone tell me where I need to unpack these plugins so that the
>> loader will pick them up?
>> 
> 
> why do you build the job jar yourself instead of using the one that our ant
> script builds? If you look at it the plugins are in /classes/plugins/
> within the jar.

Well, basically because I'm not familiar at all with Ivy and wanted to dive into the tool to understand a bit how it worked :) But yes, I solved the issue by correcting the target folder in the maven assembly descriptor target folder (so that Hadoop unpacks this folder as well.)  

>> Side note: is anyone interested in overhauling this loading mechanism? The
>> XML manifest could be replaced with an annotation class, although I would
>> be happy enough if I could include and load it into the jar.
>> 
> 
> I like the idea of replacing the XML manifest with annotations - or maybe
> initially allow both. In an ideal world plugins would be handled as
> dependencies and we could just get the jars for them. I am sure there would
> be a way of making the XML file a part of the artefact but if we don't have
> to and can have a pom and a jar then it would certainly be simpler.
> 
> Feel free to open a new JIRA for this and contribute a patch if you can.

I was already looking into that, and had to hack away a bit at the plugin manifest parser. Seems to work alright but then explodes at runtime when loading the class (an NPE on the classloader if I remember correctly.)  

I mixed several changes so I'll have to clean up and organize my thoughts. ;) I was thinking the following would work: plugin jars stay where they are, move plugin.xml into jar "META-INF/nutch", iterate over plugin paths, parse XML and load the classes declared in the XML. Does the plugin also need to export "lib" folders?  


Best,
Edoardo

> Thanks
> 
> Julien
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble


Re: Plugin loading and NUTCH-609

Posted by Julien Nioche <li...@gmail.com>.
Hi Edoardo,

See my comments below

On 12 September 2014 11:11, Edoardo Causarano <ed...@gmail.com>
wrote:

> Hi all,
>
> I'm completely lost, can anyone help me out here?
>
> I have this job.jar which contains all Nutch code, dependencies and
> plugins. I don't understand how I keep getting this error:
>
> 2014-09-12 11:51:04,458 WARN [main]
> org.apache.nutch.plugin.PluginRepository: Plugins: not a file: url. Can't
> load plugins from:
> jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oracle/appcache/application_1410512500237_0003/filecache/10/job.jar/job.jar!/lib/plugins
>
> Ok, I have to admit that I'm mucking around with the project structure but
> having found NUTCH-609 it seems that the PluginManifestParser does not
> support loading plugins from the job payload itself. Is this the case? If
> so, can anyone tell me where I need to unpack these plugins so that the
> loader will pick them up?
>

why do you build the job jar yourself instead of using the one that our ant
script builds? If you look at it the plugins are in /classes/plugins/
within the jar.


> Side note: is anyone interested in overhauling this loading mechanism? The
> XML manifest could be replaced with an annotation class, although I would
> be happy enough if I could include and load it into the jar.
>

I like the idea of replacing the XML manifest with annotations - or maybe
initially allow both. In an ideal world plugins would be handled as
dependencies and we could just get the jars for them. I am sure there would
be a way of making the XML file a part of the artefact but if we don't have
to and can have a pom and a jar then it would certainly be simpler.

Feel free to open a new JIRA for this and contribute a patch if you can.

Thanks

Julien
-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble