You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2017/08/18 14:37:00 UTC

[jira] [Work started] (NUTCH-2316) Library conflict with Parser-Tika Plugin and Lib Folder

     [ https://issues.apache.org/jira/browse/NUTCH-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on NUTCH-2316 started by Sebastian Nagel.
----------------------------------------------
> Library conflict with Parser-Tika Plugin and Lib Folder
> -------------------------------------------------------
>
>                 Key: NUTCH-2316
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2316
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11
>            Reporter: Christian Weber
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.14
>
>
> Hello Apache Nutch Team,
> everytime Nutch wants to parse a *.class file this Exception pops up:
> {quote}
> java.util.concurrent.ExecutionException: java.lang.IncompatibleClassChangeError: class org.apache.tika.parser.asm.XHTMLClassVisitor has interface org.objectweb.asm.ClassVisitor as super class
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
> 	at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:171)
> 	at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95)
> 	at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:103)
> 	at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:45)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IncompatibleClassChangeError: class org.apache.tika.parser.asm.XHTMLClassVisitor has interface org.objectweb.asm.ClassVisitor as super class
> 	at java.lang.ClassLoader.defineClass1(Native Method)
> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
> 	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51)
> 	at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:132)
> 	at de.qaware.qasearch.nutch.parse.SafetyTikaParser.getParse(SafetyTikaParser.java:73)
> 	at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35)
> 	at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24)
> 	... 4 more
> {quote}
> What I have found out is
> * ASM-Library is a library used inside TIKA Parser, and it has a breaking change with a certain version (3.x stated in TIKA-1240).
> * parse-tika has the correct Version inside it's Plugin Folder (tika-parser 1.11 uses asm 5.0.4, see http://mvnrepository.com/artifact/org.apache.tika/tika-parsers/1.11).
> * inside the nutch/lib folder is a asm jar with Version 3.3.1 which is incompatible
> I guess the Problem is that the JVM executing Nutch uses the asm library from inside the nutch/lib folder.
> (This Issue is related with NUTCH-2071. I hope it's okay that I opened a new Issue)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)