You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Sujen Shah <su...@apache.org> on 2016/09/12 21:56:50 UTC

Plugin dependancies do not get added to classpath while running Nutch in local mode

Hi Devs,

I am facing issues in loading jars required for plugins while running Nutch
in local mode.

I am doing the following :
1. add a dependency in <some-plugin>/ivy.xml
2. ant clean runtime

Now, when I print the classpath before running, the /bin/nutch script does
not seem to be adding those jars on to the classpath and throws runtime
exceptions. To mitigate this I added the dependency in the root ivy.xml.

I don't know if I am missing something here or anyone else has faced the
same issue and found a solution.
For example -
https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq,
the dependency for amqp-client had to be added in the root ivy.xml as well
for it to not throw runtime exceptions (ex - ClassNotFound)

I have a created a patch which modifies the ./bin/nutch script to load the
plugin jars onto the classpath which is attached below. This patch
eliminates the need to modify the root ivy.xml for plugin specific
dependencies.

I wanted to ask the devs first if there was already a solution before
filing a JIRA issue. If not, I'll submit it through JIRA.

Thank you for your help.


Regards,
Sujen Shah

Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Sujen,

strange: other Kafka classes are found (ConfigDef, ProducerConfig, ConfigException) but
DefaultPartitioner which is contained in the same jar not. Need to check the Kafka code how the
partitioner classes are loaded. Could be that Kafka loads classes dynamically in a way incompatible
with Nutch's plugin class loader design. We had a similar problem with the Rome library, cf.
NUTCH-1494 and NUTCH-1893 resp. https://github.com/rometools/rome/issues/130

> Thanks for your help and comments on https://github.com/apache/nutch/pull/152

Thanks for your patience. Keeping the base class path (without plugins) as lean as possible is
important. Dependency conflicts are ugly to resolve, see NUTCH-2316 for a recent problem.

Thanks,
Sebastian

On 09/26/2016 04:17 AM, Sujen Shah wrote:
> Hi Sebastian, 
> 
> Here is the complete log trace from the haddop.log file
> 
> 2016-09-25 19:14:08,455 INFO  fetcher.FetchItemQueues - Using queue mode : byHost
> 2016-09-25 19:14:08,455 INFO  fetcher.Fetcher - Fetcher: threads: 50
> 2016-09-25 19:14:08,455 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2
> 2016-09-25 19:14:08,459 INFO  fetcher.QueueFeeder - QueueFeeder finished: total 3 records + hit by
> time limit :0
> 2016-09-25 19:14:08,559 INFO  net.URLExemptionFilters - Found 0 extensions at
> point:'org.apache.nutch.net.URLExemptionFilter'
> 2016-09-25 19:14:08,570 INFO  fetcher.FetcherThreadPublisher - Setting up publishers
> 2016-09-25 19:14:08,587 WARN  mapred.LocalJobRunner - job_local1447446310_0001
> java.lang.Exception: java.lang.ExceptionInInitializerError
> at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ExceptionInInitializerError
> at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:188)
> at org.apache.nutch.publisher.kafka.KafkaPublisherImpl.setConfig(KafkaPublisherImpl.java:70)
> at org.apache.nutch.publisher.NutchPublishers.setConfig(NutchPublishers.java:44)
> at org.apache.nutch.fetcher.FetcherThreadPublisher.<init>(FetcherThreadPublisher.java:40)
> at org.apache.nutch.fetcher.FetcherThread.<init>(FetcherThread.java:174)
> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:213)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
> org.apache.kafka.clients.producer.internals.DefaultPartitioner for configuration partitioner.class:
> Class org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be found.
> at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:672)
> at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:110)
> at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:132)
> at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:171)
> at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:333)
> at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:346)
> at org.apache.kafka.clients.producer.ProducerConfig.<clinit>(ProducerConfig.java:222)
> ... 14 more
> 2016-09-25 19:14:09,346 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:484)
> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:519)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:493)
> 
> Thanks for your help and comments on https://github.com/apache/nutch/pull/152
> <https://github.com/apache/nutch/pull/152>.
> 
> On Sun, Sep 25, 2016 at 2:54 AM, Sebastian Nagel <wastl.nagel@googlemail.com
> <ma...@googlemail.com>> wrote:
> 
>     Hi Sujen,
> 
>     could you send the complete stack trace? Just to be sure from where the error stems.
> 
>     > I looked at the code here https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
>     <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>
>     > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
>     <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>> and cannot understand the use
>     > of lines 161-163, if the plugins folder is found add the home directory to the classpath ?
> 
>     In a local installation $NUTCH_HOME ("runtime/local") is added to the classpath because the folder
>     "plugins" defined in the property "plugin.folders" is located here ("runtime/local/plugins"), see:
> 
>     <property>
>       <name>plugin.folders</name>
>       <value>plugins</value>
>       <description>Directories where nutch plugins are located.  Each
>       element may be a relative or absolute path.  If absolute, it is used
>       as is.  If relative, it is searched for on the classpath.</description>
>     </property>
> 
>     See also my comments on https://github.com/apache/nutch/pull/152
>     <https://github.com/apache/nutch/pull/152>
> 
>     Sebastian
> 
> 
>     On 09/23/2016 12:06 AM, Sujen Shah wrote:
>     > Thank you Sebastian for your response.
>     >
>     > I followed the steps as per your suggestion and added the required jars under runtime in plugin.xml.
>     > My code is at - https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml
>     <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml>
>     > <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml
>     <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml>>.
>     >
>     > Now after compiling and running ./bin/crawl in local mode, the fetch job fails due to
>     >
>     > Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
>     > org.apache.kafka.clients.producer.internals.DefaultPartitioner for configuration partitioner.class:
>     > Class org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be found.
>     >
>     > Am I missing something ?
>     >
>     > To find out the cause for this, I copied the jars from the runtime/local/plugin/<some-plugin>/*.jar
>     > to the runtime/local/lib directory, the code seems to work perfectly fine, which may imply that the
>     > jars listed under the runtime tag in plugin.xml are not getting added to classpath during runtime.
>     >
>     > I looked at the code here https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
>     <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>
>     > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
>     <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>> and cannot understand the use
>     > of lines 161-163, if the plugins folder is found add the home directory to the classpath ?
>     > Looking into to various ways to set a classpath
>     > (https://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
>     <https://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762>), it says
>     > that subdirectories are not searched recursively.
>     >
>     > Thanks once again for your help.
>     >
>     >
>     > On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <wastl.nagel@googlemail.com <ma...@googlemail.com>
>     > <mailto:wastl.nagel@googlemail.com <ma...@googlemail.com>>> wrote:
>     >
>     >     Hi Sujen,
>     >
>     >     are the jars also listed in the plugin.xml?
>     >
>     >     That's required. The plugin-specific ivy.xml is only used at compile time
>     >     to fetch the library and its dependencies and get the plugin compiled.
>     >
>     >     At runtime all required libs have to be listed in the plugin.xml, e.g.,
>     >     https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml
>     <https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml>
>     >     <https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml
>     <https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml>>
>     >
>     >     This double work is not ideal and a frequent cause for errors but that's
>     >     how it works right now.
>     >
>     >     Cheers,
>     >     Sebastian
>     >
>     >
>     >     On 09/12/2016 11:56 PM, Sujen Shah wrote:
>     >     > Hi Devs,
>     >     >
>     >     > I am facing issues in loading jars required for plugins while running Nutch in local mode.
>     >     >
>     >     > I am doing the following :
>     >     > 1. add a dependency in <some-plugin>/ivy.xml
>     >     > 2. ant clean runtime
>     >     >
>     >     > Now, when I print the classpath before running, the /bin/nutch script does not seem to
>     be adding
>     >     > those jars on to the classpath and throws runtime exceptions. To mitigate this I added the
>     >     > dependency in the root ivy.xml.
>     >     >
>     >     > I don't know if I am missing something here or anyone else has faced the same issue and
>     found a
>     >     > solution.
>     >     > For example - https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq
>     <https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq>
>     >     <https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq
>     <https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq>>, the
>     >     > dependency for amqp-client had to be added in the root ivy.xml as well for it to not
>     throw runtime
>     >     > exceptions (ex - ClassNotFound)
>     >     >
>     >     > I have a created a patch which modifies the ./bin/nutch script to load the plugin jars
>     onto the
>     >     > classpath which is attached below. This patch eliminates the need to modify the root
>     ivy.xml for
>     >     > plugin specific dependencies.
>     >     >
>     >     > I wanted to ask the devs first if there was already a solution before filing a JIRA
>     issue. If not,
>     >     > I'll submit it through JIRA.
>     >     >
>     >     > Thank you for your help.
>     >     >
>     >     >
>     >     > Regards,
>     >     > Sujen Shah
>     >
>     >
> 
> 


Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Posted by Sujen Shah <su...@apache.org>.
Hi Sebastian,

Here is the complete log trace from the haddop.log file

2016-09-25 19:14:08,455 INFO  fetcher.FetchItemQueues - Using queue mode :
byHost
2016-09-25 19:14:08,455 INFO  fetcher.Fetcher - Fetcher: threads: 50
2016-09-25 19:14:08,455 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2
2016-09-25 19:14:08,459 INFO  fetcher.QueueFeeder - QueueFeeder finished:
total 3 records + hit by time limit :0
2016-09-25 19:14:08,559 INFO  net.URLExemptionFilters - Found 0 extensions
at point:'org.apache.nutch.net.URLExemptionFilter'
2016-09-25 19:14:08,570 INFO  fetcher.FetcherThreadPublisher - Setting up
publishers
2016-09-25 19:14:08,587 WARN  mapred.LocalJobRunner -
job_local1447446310_0001
java.lang.Exception: java.lang.ExceptionInInitializerError
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ExceptionInInitializerError
at
org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:188)
at
org.apache.nutch.publisher.kafka.KafkaPublisherImpl.setConfig(KafkaPublisherImpl.java:70)
at
org.apache.nutch.publisher.NutchPublishers.setConfig(NutchPublishers.java:44)
at
org.apache.nutch.fetcher.FetcherThreadPublisher.<init>(FetcherThreadPublisher.java:40)
at org.apache.nutch.fetcher.FetcherThread.<init>(FetcherThread.java:174)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:213)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
org.apache.kafka.clients.producer.internals.DefaultPartitioner for
configuration partitioner.class: Class
org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be
found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:672)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:110)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:132)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:171)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:333)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:346)
at
org.apache.kafka.clients.producer.ProducerConfig.<clinit>(ProducerConfig.java:222)
... 14 more
2016-09-25 19:14:09,346 ERROR fetcher.Fetcher - Fetcher:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:484)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:519)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:493)

Thanks for your help and comments on https://github.com/apache/
nutch/pull/152.

On Sun, Sep 25, 2016 at 2:54 AM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> Hi Sujen,
>
> could you send the complete stack trace? Just to be sure from where the
> error stems.
>
> > I looked at the code here https://github.com/apache/
> nutch/blob/master/src/bin/nutch#L155-L164
> > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>
> and cannot understand the use
> > of lines 161-163, if the plugins folder is found add the home directory
> to the classpath ?
>
> In a local installation $NUTCH_HOME ("runtime/local") is added to the
> classpath because the folder
> "plugins" defined in the property "plugin.folders" is located here
> ("runtime/local/plugins"), see:
>
> <property>
>   <name>plugin.folders</name>
>   <value>plugins</value>
>   <description>Directories where nutch plugins are located.  Each
>   element may be a relative or absolute path.  If absolute, it is used
>   as is.  If relative, it is searched for on the classpath.</description>
> </property>
>
> See also my comments on https://github.com/apache/nutch/pull/152
>
> Sebastian
>
>
> On 09/23/2016 12:06 AM, Sujen Shah wrote:
> > Thank you Sebastian for your response.
> >
> > I followed the steps as per your suggestion and added the required jars
> under runtime in plugin.xml.
> > My code is at - https://github.com/sujen1412/
> nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml
> > <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/
> publish-kafka/plugin.xml>.
> >
> > Now after compiling and running ./bin/crawl in local mode, the fetch job
> fails due to
> >
> > Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
> > org.apache.kafka.clients.producer.internals.DefaultPartitioner for
> configuration partitioner.class:
> > Class org.apache.kafka.clients.producer.internals.DefaultPartitioner
> could not be found.
> >
> > Am I missing something ?
> >
> > To find out the cause for this, I copied the jars from the
> runtime/local/plugin/<some-plugin>/*.jar
> > to the runtime/local/lib directory, the code seems to work perfectly
> fine, which may imply that the
> > jars listed under the runtime tag in plugin.xml are not getting added to
> classpath during runtime.
> >
> > I looked at the code here https://github.com/apache/
> nutch/blob/master/src/bin/nutch#L155-L164
> > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>
> and cannot understand the use
> > of lines 161-163, if the plugins folder is found add the home directory
> to the classpath ?
> > Looking into to various ways to set a classpath
> > (https://docs.oracle.com/javase/8/docs/technotes/tools/
> windows/classpath.html#A1100762), it says
> > that subdirectories are not searched recursively.
> >
> > Thanks once again for your help.
> >
> >
> > On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <
> wastl.nagel@googlemail.com
> > <ma...@googlemail.com>> wrote:
> >
> >     Hi Sujen,
> >
> >     are the jars also listed in the plugin.xml?
> >
> >     That's required. The plugin-specific ivy.xml is only used at compile
> time
> >     to fetch the library and its dependencies and get the plugin
> compiled.
> >
> >     At runtime all required libs have to be listed in the plugin.xml,
> e.g.,
> >     https://github.com/apache/nutch/blob/master/src/plugin/
> parse-tika/plugin.xml
> >     <https://github.com/apache/nutch/blob/master/src/plugin/
> parse-tika/plugin.xml>
> >
> >     This double work is not ideal and a frequent cause for errors but
> that's
> >     how it works right now.
> >
> >     Cheers,
> >     Sebastian
> >
> >
> >     On 09/12/2016 11:56 PM, Sujen Shah wrote:
> >     > Hi Devs,
> >     >
> >     > I am facing issues in loading jars required for plugins while
> running Nutch in local mode.
> >     >
> >     > I am doing the following :
> >     > 1. add a dependency in <some-plugin>/ivy.xml
> >     > 2. ant clean runtime
> >     >
> >     > Now, when I print the classpath before running, the /bin/nutch
> script does not seem to be adding
> >     > those jars on to the classpath and throws runtime exceptions. To
> mitigate this I added the
> >     > dependency in the root ivy.xml.
> >     >
> >     > I don't know if I am missing something here or anyone else has
> faced the same issue and found a
> >     > solution.
> >     > For example - https://github.com/apache/
> nutch/tree/master/src/plugin/publish-rabbitmq
> >     <https://github.com/apache/nutch/tree/master/src/plugin/
> publish-rabbitmq>, the
> >     > dependency for amqp-client had to be added in the root ivy.xml as
> well for it to not throw runtime
> >     > exceptions (ex - ClassNotFound)
> >     >
> >     > I have a created a patch which modifies the ./bin/nutch script to
> load the plugin jars onto the
> >     > classpath which is attached below. This patch eliminates the need
> to modify the root ivy.xml for
> >     > plugin specific dependencies.
> >     >
> >     > I wanted to ask the devs first if there was already a solution
> before filing a JIRA issue. If not,
> >     > I'll submit it through JIRA.
> >     >
> >     > Thank you for your help.
> >     >
> >     >
> >     > Regards,
> >     > Sujen Shah
> >
> >
>
>

Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Sujen,

could you send the complete stack trace? Just to be sure from where the error stems.

> I looked at the code here https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
> <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164> and cannot understand the use
> of lines 161-163, if the plugins folder is found add the home directory to the classpath ?

In a local installation $NUTCH_HOME ("runtime/local") is added to the classpath because the folder
"plugins" defined in the property "plugin.folders" is located here ("runtime/local/plugins"), see:

<property>
  <name>plugin.folders</name>
  <value>plugins</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

See also my comments on https://github.com/apache/nutch/pull/152

Sebastian


On 09/23/2016 12:06 AM, Sujen Shah wrote:
> Thank you Sebastian for your response. 
> 
> I followed the steps as per your suggestion and added the required jars under runtime in plugin.xml.
> My code is at - https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml
> <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml>.
> 
> Now after compiling and running ./bin/crawl in local mode, the fetch job fails due to 
> 
> Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
> org.apache.kafka.clients.producer.internals.DefaultPartitioner for configuration partitioner.class:
> Class org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be found.
> 
> Am I missing something ? 
> 
> To find out the cause for this, I copied the jars from the runtime/local/plugin/<some-plugin>/*.jar
> to the runtime/local/lib directory, the code seems to work perfectly fine, which may imply that the
> jars listed under the runtime tag in plugin.xml are not getting added to classpath during runtime. 
> 
> I looked at the code here https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
> <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164> and cannot understand the use
> of lines 161-163, if the plugins folder is found add the home directory to the classpath ?
> Looking into to various ways to set a classpath
> (https://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762), it says
> that subdirectories are not searched recursively. 
> 
> Thanks once again for your help. 
> 
> 
> On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <wastl.nagel@googlemail.com
> <ma...@googlemail.com>> wrote:
> 
>     Hi Sujen,
> 
>     are the jars also listed in the plugin.xml?
> 
>     That's required. The plugin-specific ivy.xml is only used at compile time
>     to fetch the library and its dependencies and get the plugin compiled.
> 
>     At runtime all required libs have to be listed in the plugin.xml, e.g.,
>     https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml
>     <https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml>
> 
>     This double work is not ideal and a frequent cause for errors but that's
>     how it works right now.
> 
>     Cheers,
>     Sebastian
> 
> 
>     On 09/12/2016 11:56 PM, Sujen Shah wrote:
>     > Hi Devs,
>     >
>     > I am facing issues in loading jars required for plugins while running Nutch in local mode.
>     >
>     > I am doing the following :
>     > 1. add a dependency in <some-plugin>/ivy.xml
>     > 2. ant clean runtime
>     >
>     > Now, when I print the classpath before running, the /bin/nutch script does not seem to be adding
>     > those jars on to the classpath and throws runtime exceptions. To mitigate this I added the
>     > dependency in the root ivy.xml.
>     >
>     > I don't know if I am missing something here or anyone else has faced the same issue and found a
>     > solution.
>     > For example - https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq
>     <https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq>, the
>     > dependency for amqp-client had to be added in the root ivy.xml as well for it to not throw runtime
>     > exceptions (ex - ClassNotFound)
>     >
>     > I have a created a patch which modifies the ./bin/nutch script to load the plugin jars onto the
>     > classpath which is attached below. This patch eliminates the need to modify the root ivy.xml for
>     > plugin specific dependencies.
>     >
>     > I wanted to ask the devs first if there was already a solution before filing a JIRA issue. If not,
>     > I'll submit it through JIRA.
>     >
>     > Thank you for your help.
>     >
>     >
>     > Regards,
>     > Sujen Shah
> 
> 


Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Posted by Sujen Shah <su...@apache.org>.
Thank you Sebastian for your response.

I followed the steps as per your suggestion and added the required jars
under runtime in plugin.xml. My code is at - https://github.com/
sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml.

Now after compiling and running ./bin/crawl in local mode, the fetch job
fails due to

Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
org.apache.kafka.clients.producer.internals.DefaultPartitioner for
configuration partitioner.class: Class org.apache.kafka.clients.
producer.internals.DefaultPartitioner could not be found.

Am I missing something ?

To find out the cause for this, I copied the jars from the
runtime/local/plugin/<some-plugin>/*.jar to the runtime/local/lib
directory, the code seems to work perfectly fine, which may imply that the
jars listed under the runtime tag in plugin.xml are not getting added to
classpath during runtime.

I looked at the code here https://github.com/apache/nutch/blob/master/src/
bin/nutch#L155-L164 and cannot understand the use of lines 161-163, if the
plugins folder is found add the home directory to the classpath ?
Looking into to various ways to set a classpath (
https://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762),
it says that subdirectories are not searched recursively.

Thanks once again for your help.


On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <
wastl.nagel@googlemail.com> wrote:

> Hi Sujen,
>
> are the jars also listed in the plugin.xml?
>
> That's required. The plugin-specific ivy.xml is only used at compile time
> to fetch the library and its dependencies and get the plugin compiled.
>
> At runtime all required libs have to be listed in the plugin.xml, e.g.,
> https://github.com/apache/nutch/blob/master/src/plugin/
> parse-tika/plugin.xml
>
> This double work is not ideal and a frequent cause for errors but that's
> how it works right now.
>
> Cheers,
> Sebastian
>
>
> On 09/12/2016 11:56 PM, Sujen Shah wrote:
> > Hi Devs,
> >
> > I am facing issues in loading jars required for plugins while running
> Nutch in local mode.
> >
> > I am doing the following :
> > 1. add a dependency in <some-plugin>/ivy.xml
> > 2. ant clean runtime
> >
> > Now, when I print the classpath before running, the /bin/nutch script
> does not seem to be adding
> > those jars on to the classpath and throws runtime exceptions. To
> mitigate this I added the
> > dependency in the root ivy.xml.
> >
> > I don't know if I am missing something here or anyone else has faced the
> same issue and found a
> > solution.
> > For example - https://github.com/apache/nutch/tree/master/src/plugin/
> publish-rabbitmq, the
> > dependency for amqp-client had to be added in the root ivy.xml as well
> for it to not throw runtime
> > exceptions (ex - ClassNotFound)
> >
> > I have a created a patch which modifies the ./bin/nutch script to load
> the plugin jars onto the
> > classpath which is attached below. This patch eliminates the need to
> modify the root ivy.xml for
> > plugin specific dependencies.
> >
> > I wanted to ask the devs first if there was already a solution before
> filing a JIRA issue. If not,
> > I'll submit it through JIRA.
> >
> > Thank you for your help.
> >
> >
> > Regards,
> > Sujen Shah
>
>

Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Sujen,

are the jars also listed in the plugin.xml?

That's required. The plugin-specific ivy.xml is only used at compile time
to fetch the library and its dependencies and get the plugin compiled.

At runtime all required libs have to be listed in the plugin.xml, e.g.,
https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml

This double work is not ideal and a frequent cause for errors but that's
how it works right now.

Cheers,
Sebastian


On 09/12/2016 11:56 PM, Sujen Shah wrote:
> Hi Devs, 
> 
> I am facing issues in loading jars required for plugins while running Nutch in local mode. 
> 
> I am doing the following :
> 1. add a dependency in <some-plugin>/ivy.xml
> 2. ant clean runtime 
> 
> Now, when I print the classpath before running, the /bin/nutch script does not seem to be adding
> those jars on to the classpath and throws runtime exceptions. To mitigate this I added the
> dependency in the root ivy.xml. 
> 
> I don't know if I am missing something here or anyone else has faced the same issue and found a
> solution. 
> For example - https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq, the
> dependency for amqp-client had to be added in the root ivy.xml as well for it to not throw runtime
> exceptions (ex - ClassNotFound)
> 
> I have a created a patch which modifies the ./bin/nutch script to load the plugin jars onto the
> classpath which is attached below. This patch eliminates the need to modify the root ivy.xml for
> plugin specific dependencies. 
> 
> I wanted to ask the devs first if there was already a solution before filing a JIRA issue. If not,
> I'll submit it through JIRA. 
> 
> Thank you for your help. 
> 
> 
> Regards,
> Sujen Shah