You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Arthur Meneau <am...@xetus.com> on 2011/12/05 23:32:22 UTC

NoClassDefFoundError when parsing pdf files using ForkParser

I'm having some issues when I attempt to parse pdf and docx files while using ForkParser.  I finally figured out how to get the metadata from the content handler using ToXMLContentHandler, but I am not getting any results when I parse pdfs or DocX files using ForkParser.  I have copy and pasted the error and stack trace below, but I believe this only applies to PDF files as I am not getting a unique error for DocX files.

I did a little searching, using java's jar utility to verify that the class shows up in the Jar and it does.  I was previously using the AutoDetectParser and that was working just fine with the exception that I could not limit tika's memory usage, so this seems to be unique to ForkParser.

Thanks,
-Arthur

StackTrace: 
log4j:WARN Caught Exception while in Loader.getResource. This may be innocuous.
java.lang.NoClassDefFoundError: org/apache/tika/fork/MemoryURLStreamHandler$Record
	at org.apache.tika.fork.MemoryURLStreamHandler.createURL(MemoryURLStreamHandler.java:46)
	at org.apache.tika.fork.ClassLoaderProxy.findResource(ClassLoaderProxy.java:73)
	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
	at org.apache.log4j.helpers.Loader.getResource(Loader.java:96)
	at org.apache.log4j.LogManager.<clinit>(LogManager.java:105)
	at org.apache.log4j.Logger.getLogger(Logger.java:104)
	at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
	at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:914)
	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
	at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:58)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1087)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:80)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.tika.fork.ForkServer.call(ForkServer.java:136)
	at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:116)
	at org.apache.tika.fork.ForkServer.main(ForkServer.java:64)

Re: NoClassDefFoundError when parsing pdf files using ForkParser

Posted by Arthur Meneau <am...@xetus.com>.
I'm still having issues with this, I was also having trouble parsing iWork files and Nick was helping me with that. In any case, he had me test the file I was parsing using the tika 1.0 client. I noticed that there's a fork option that uses the forkparser if you pass the "-f" argument when running the tika client.  I tried this, but got the following error.  Can someone test this with Tika 1.0?

here's the command I used (I changed the path of the file, but assume the path is correct):
java -jar tika-app-1.0.jar -f /path/to/some/testPDF.pdf

This is the exception I received with a stack trace:
Exception in thread "main" java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:260)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
	at java.io.DataOutputStream.flush(DataOutputStream.java:106)
	at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:163)
	at org.apache.tika.fork.ForkClient.sendObject(ForkClient.java:137)
	at org.apache.tika.fork.ForkClient.<init>(ForkClient.java:71)
	at org.apache.tika.fork.ForkParser.acquireClient(ForkParser.java:159)
	at org.apache.tika.fork.ForkParser.parse(ForkParser.java:118)
	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:128)
	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:392)
	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:99)

any help would be greatly appreciated!
Thanks!
-Arthur Meneau

On Dec 5, 2011, at 2:47 PM, Arthur Meneau wrote:

> Yikes, forgot to mention, this is using Tika 1.0 with all the appropriate dependencies.  I'm using pdfbox version 1.6.0.  This is running on Java 6 on Mac OS X version 10.6.
> 
> Any help getting pdfs to parse would be greatly appreciated!
> 
> Thanks,
> -Arthur Meneau
> 
> On Dec 5, 2011, at 2:32 PM, Arthur Meneau wrote:
> 
>> I'm having some issues when I attempt to parse pdf and docx files while using ForkParser.  I finally figured out how to get the metadata from the content handler using ToXMLContentHandler, but I am not getting any results when I parse pdfs or DocX files using ForkParser.  I have copy and pasted the error and stack trace below, but I believe this only applies to PDF files as I am not getting a unique error for DocX files.
>> 
>> I did a little searching, using java's jar utility to verify that the class shows up in the Jar and it does.  I was previously using the AutoDetectParser and that was working just fine with the exception that I could not limit tika's memory usage, so this seems to be unique to ForkParser.
>> 
>> Thanks,
>> -Arthur
>> 
>> StackTrace: 
>> log4j:WARN Caught Exception while in Loader.getResource. This may be innocuous.
>> java.lang.NoClassDefFoundError: org/apache/tika/fork/MemoryURLStreamHandler$Record
>> 	at org.apache.tika.fork.MemoryURLStreamHandler.createURL(MemoryURLStreamHandler.java:46)
>> 	at org.apache.tika.fork.ClassLoaderProxy.findResource(ClassLoaderProxy.java:73)
>> 	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
>> 	at org.apache.log4j.helpers.Loader.getResource(Loader.java:96)
>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:105)
>> 	at org.apache.log4j.Logger.getLogger(Logger.java:104)
>> 	at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
>> 	at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109)
>> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>> 	at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
>> 	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:914)
>> 	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
>> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
>> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>> 	at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:58)
>> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1087)
>> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
>> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:80)
>> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.tika.fork.ForkServer.call(ForkServer.java:136)
>> 	at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:116)
>> 	at org.apache.tika.fork.ForkServer.main(ForkServer.java:64)
> 


Re: NoClassDefFoundError when parsing pdf files using ForkParser

Posted by Arthur Meneau <am...@xetus.com>.
Yikes, forgot to mention, this is using Tika 1.0 with all the appropriate dependencies.  I'm using pdfbox version 1.6.0.  This is running on Java 6 on Mac OS X version 10.6.

Any help getting pdfs to parse would be greatly appreciated!

Thanks,
-Arthur Meneau

On Dec 5, 2011, at 2:32 PM, Arthur Meneau wrote:

> I'm having some issues when I attempt to parse pdf and docx files while using ForkParser.  I finally figured out how to get the metadata from the content handler using ToXMLContentHandler, but I am not getting any results when I parse pdfs or DocX files using ForkParser.  I have copy and pasted the error and stack trace below, but I believe this only applies to PDF files as I am not getting a unique error for DocX files.
> 
> I did a little searching, using java's jar utility to verify that the class shows up in the Jar and it does.  I was previously using the AutoDetectParser and that was working just fine with the exception that I could not limit tika's memory usage, so this seems to be unique to ForkParser.
> 
> Thanks,
> -Arthur
> 
> StackTrace: 
> log4j:WARN Caught Exception while in Loader.getResource. This may be innocuous.
> java.lang.NoClassDefFoundError: org/apache/tika/fork/MemoryURLStreamHandler$Record
> 	at org.apache.tika.fork.MemoryURLStreamHandler.createURL(MemoryURLStreamHandler.java:46)
> 	at org.apache.tika.fork.ClassLoaderProxy.findResource(ClassLoaderProxy.java:73)
> 	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
> 	at org.apache.log4j.helpers.Loader.getResource(Loader.java:96)
> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:105)
> 	at org.apache.log4j.Logger.getLogger(Logger.java:104)
> 	at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
> 	at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:914)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
> 	at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:58)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1087)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:80)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.tika.fork.ForkServer.call(ForkServer.java:136)
> 	at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:116)
> 	at org.apache.tika.fork.ForkServer.main(ForkServer.java:64)