You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dennis Spathis <ds...@gmail.com> on 2012/01/17 16:16:51 UTC

incompatible neko and xerces versions?

Hi,

The Nutch 1.4 distribution includes

 - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
nekohtml)
 - xercesImpl-2.9.1.jar (under .../runtime/local/lib)

These two JARs appear to be incompatible versions. When the HtmlParser
(configured to use neko) is invoked during a local-mode crawl, the parse
fails due to an AbstractMethodError. (Note: I discovered the
AbstractMethodError by rebuilding the HtmlParser plugin and adding a
catch(Throwable) clause in the getParse method to log the stacktrace. With
the original code, the error is unhandled and simply results in the
unhelpful log message "Unable to successfully parse content".).

I found that substituting a later, compatible version of nekohtml (1.9.11)
fixes the problem.

Curiously, and in support of the above, the nekohtml plugin.xml file in
Nutch 1.4 contains the following:

<plugin
   id="lib-nekohtml"
   name="CyberNeko HTML Parser"
   version="1.9.11"
   provider-name="org.cyberneko">

   <runtime>
       <library name="nekohtml-0.9.5.jar">
           <export name="*"/>
       </library>
   </runtime>
</plugin>

Note the conflicting version numbers (version tag is "1.9.11" but the
specified library is "nekohtml-0.9.5.jar").

Was the 0.9.5 version included by mistake? Was the intention rather to
include 1.9.11?

I'm a Nutch newbie, so please forgive me if I'm missing something obvious
here... :)

Re: incompatible neko and xerces versions?

Posted by dspathis <ds...@gmail.com>.
Done. 

I create the following issue:
    https://issues.apache.org/jira/browse/NUTCH-1253

--
View this message in context: http://lucene.472066.n3.nabble.com/incompatible-neko-and-xerces-versions-tp3666404p3669386.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: incompatible neko and xerces versions?

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Dennis,

Would it be possible for you to open an issue on our Jira as this sounds
like we need to document and catch it.

Thanks very much for reporting.

Kind Regards

Lewis

On Tue, Jan 17, 2012 at 3:16 PM, Dennis Spathis <ds...@gmail.com> wrote:

> Hi,
>
> The Nutch 1.4 distribution includes
>
>  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
> nekohtml)
>  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
>
> These two JARs appear to be incompatible versions. When the HtmlParser
> (configured to use neko) is invoked during a local-mode crawl, the parse
> fails due to an AbstractMethodError. (Note: I discovered the
> AbstractMethodError by rebuilding the HtmlParser plugin and adding a
> catch(Throwable) clause in the getParse method to log the stacktrace. With
> the original code, the error is unhandled and simply results in the
> unhelpful log message "Unable to successfully parse content".).
>
> I found that substituting a later, compatible version of nekohtml (1.9.11)
> fixes the problem.
>
> Curiously, and in support of the above, the nekohtml plugin.xml file in
> Nutch 1.4 contains the following:
>
> <plugin
>   id="lib-nekohtml"
>   name="CyberNeko HTML Parser"
>   version="1.9.11"
>   provider-name="org.cyberneko">
>
>   <runtime>
>       <library name="nekohtml-0.9.5.jar">
>           <export name="*"/>
>       </library>
>   </runtime>
> </plugin>
>
> Note the conflicting version numbers (version tag is "1.9.11" but the
> specified library is "nekohtml-0.9.5.jar").
>
> Was the 0.9.5 version included by mistake? Was the intention rather to
> include 1.9.11?
>
> I'm a Nutch newbie, so please forgive me if I'm missing something obvious
> here... :)
>



-- 
*Lewis*