You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Craig Tataryn <cr...@tataryn.net> on 2020/06/24 23:18:16 UTC
protocol-interactiveselenium Custom Handler
Hello, I would like to create my own Custom Handler for
protocol-interactiveselenium.
In reading the code [1] I see that when setting the config:
<property>
<name>interactiveselenium.handlers</name>
<value>NewCustomHandler,DefaultHandler</value>
<description></description>
</property>
the "NewCustomerHandler" would be loaded from the classpath assuming it was
called: org.apache.nutch.protocol.interactiveselenium.handlers.NewCustomerHandler.
However, my question is: how do I get Nutch to incorporate my new .jar file
containing the NewCustomerHandler?
I've written protocol and indexer plugins before, however this seems a bit
different. An example of a custom handler that someone has written would be
great.
Thanks,
Craig.
[1] -
https://github.com/apache/nutch/blob/ea862f45b83177b41aebad9c18b900936d43a19a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java#L364
Re: protocol-interactiveselenium Custom Handler
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Craig,
in case, you're building Nutch from the git repo or from the source package
the easiest way is to put the file NewCustomHandler.java into
src/plugin/protocol-interactiveselenium/src/java/.../handlers/
and run
ant runtime
to compile and package Nutch including package your custom handler.
Using a jar isn't as simple, mostly because of the classpath encapsulation
of Nutch plugins.
1. add you jar as a dependency to
src/plugin/protocol-interactiveselenium/ivy.xml
2. register the file name of the jar in
src/plugin/protocol-interactiveselenium/plugin.xml
as
<library name="xyz.jar"/>
3. build Nutch, see above
Of course, ivy must be able to pick the jar from one of
the repositories listed in
ivy/ivysettings.xml
But it's possible to add your local Maven repo/cache by adding:
<property name="maven2.pattern.local"
value="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]"
override="false" />
...
<resolvers>
...
<filesystem name="maven2-local" m2compatible="true" >
<artifact pattern="${maven2.pattern.local}"/>
<ivy pattern="${maven2.pattern.local}"/>
</filesystem>
> An example of a custom handler that someone has written would be great.
There are some handler implementations in
src/plugin/protocol-interactiveselenium/src/java/.../handlers/
I've never made use of them, but they look "custom", at least,
at the first glance, because one file name includes a typo. :)
If you have time please open a Jira issue at
https://issues.apache.org/jira/projects/NUTCH
to fix the naming.
Thanks,
Sebastian
On 6/25/20 1:18 AM, Craig Tataryn wrote:
> Hello, I would like to create my own Custom Handler for
> protocol-interactiveselenium.
>
> In reading the code [1] I see that when setting the config:
>
> <property>
> <name>interactiveselenium.handlers</name>
> <value>NewCustomHandler,DefaultHandler</value>
> <description></description>
> </property>
>
> the "NewCustomerHandler" would be loaded from the classpath assuming it was
> called: org.apache.nutch.protocol.interactiveselenium.handlers.NewCustomerHandler.
> However, my question is: how do I get Nutch to incorporate my new .jar file
> containing the NewCustomerHandler?
>
> I've written protocol and indexer plugins before, however this seems a bit
> different. An example of a custom handler that someone has written would be
> great.
>
> Thanks,
>
> Craig.
>
> [1] -
> https://github.com/apache/nutch/blob/ea862f45b83177b41aebad9c18b900936d43a19a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java#L364
>