You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/02/08 13:07:11 UTC
tika-core, tika-parser?
Hi,
In Nutch we have a copy of Tika-core. But with just that lib we also have
access to the Tika.parser API from the other module. How does this all work
because i have had confusing results in the past (and now).
Right now we've added a class to org.apache.tika.parser.html but we get a
ClassNotFound with a newly compiled Tika. Our code compiles when we add tika-
parsers to the classpath, but when we run we get some obscure exception:
Exception in thread "main" java.lang.NoClassDefFoundError: Could not
initialize class org.apache.tika.parser.dwg.DWGParser
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at sun.misc.Service$LazyIterator.next(Service.java:271)
at org.apache.nutch.parse.tika.TikaConfig.<init>(TikaConfig.java:149)
at
org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211)
at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:255)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
at
org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71)
at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
When we previously patched Tika in the core module all went perfectly well but
patching the parser module and getting it all compiled in tike-core.jar seems
tricky. Any advice? What am i missing? How do the parser libs end up in the
core jar?
Thanks
Re: tika-core, tika-parser?
Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 8 Feb 2012, Markus Jelsma wrote:
> In Nutch we have a copy of Tika-core. But with just that lib we also
> have access to the Tika.parser API from the other module. How does this
> all work because i have had confusing results in the past (and now).
Tika Core comes with the core of Tika, which includes a definition of how
parsers work, but not any parsers
All the parsers themselves are in the Tika Parsers module. Most of the
parsers have dependencies on third party libraries, it's normally
recommended to use one of Maven or the OSGi Bundle to have these pulled in
for you
> Right now we've added a class to org.apache.tika.parser.html but we get a
> ClassNotFound with a newly compiled Tika. Our code compiles when we add tika-
> parsers to the classpath, but when we run we get some obscure exception:
>
> Exception in thread "main" java.lang.NoClassDefFoundError: Could not
> initialize class org.apache.tika.parser.dwg.DWGParser
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at sun.misc.Service$LazyIterator.next(Service.java:271)
> at org.apache.nutch.parse.tika.TikaConfig.<init>(TikaConfig.java:149)
> at
> org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211)
You've got a Tika parsers config file that says that the DWG parser is
present, but you haven't included it. You should either include all the
tika parsers, or not include the default org.apache.tika.parsers.Parsers
config file that lists them
Nick