You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Arthur Meneau <am...@xetus.com> on 2011/12/05 23:43:35 UTC

Apple iWork document parsing

I am having trouble parsing iWork documents with Tika 1.0.  These documents are being saved with the appropriate versions specified by Tika's API (Keynote 5.1.1, Numbers 2.1, Pages 4.1).  I have copy and pasted the error I am receiving below. How can I get iWork documents to correctly parse?

Thanks,
-Arthur Meneau

Stack Trace: 
java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.tika.parser.iwork.IWorkPackageParser$IWORKDocumentType.detectType(IWorkPackageParser.java:125)
	at org.apache.tika.parser.iwork.IWorkPackageParser$IWORKDocumentType.detectType(IWorkPackageParser.java:106)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detectIWork(ZipContainerDetector.java:163)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:76)
	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:60)
	at org.apache.tika.Tika.detect(Tika.java:133)
	at org.apache.tika.Tika.detect(Tika.java:267)
	at org.apache.tika.Tika.detect(Tika.java:248)
	at xetus.util.io.FileAnalyzer.getMetadata(FileAnalyzer.java:156)
	at xetus.util.io.FileAnalyzer.getMetadata(FileAnalyzer.java:72)
	at xetus.util.io.BulkFileAnalyzerTest.testBulkFileTypeDetection(BulkFileAnalyzerTest.java:137)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at junit.framework.TestCase.runTest(TestCase.java:154)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at junit.framework.TestSuite.runTest(TestSuite.java:208)
	at junit.framework.TestSuite.run(TestSuite.java:203)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


Re: Apple iWork document parsing

Posted by Arthur Meneau <am...@xetus.com>.
Nick,

This is done.  The files I had used originally were very small test files, I included all three so you can test keynote, pages and numbers.

Thanks for the quick response,
-Arthur


On Dec 5, 2011, at 5:02 PM, Nick Burch wrote:

> On Mon, 5 Dec 2011, Arthur Meneau wrote:
>> I am having trouble parsing iWork documents with Tika 1.0.  These documents are being saved with the appropriate versions specified by Tika's API (Keynote 5.1.1, Numbers 2.1, Pages 4.1).  I have copy and pasted the error I am receiving below. How can I get iWork documents to correctly parse?
> 
> Any chance that you could create a new issue in JIRA, and upload a small sample file that causes the error? (Ideally the smallest file you can create that gives the problem)
> 
> Cheers
> Nick


Re: Apple iWork document parsing

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 5 Dec 2011, Arthur Meneau wrote:
> I am having trouble parsing iWork documents with Tika 1.0.  These 
> documents are being saved with the appropriate versions specified by 
> Tika's API (Keynote 5.1.1, Numbers 2.1, Pages 4.1).  I have copy and 
> pasted the error I am receiving below. How can I get iWork documents to 
> correctly parse?

Any chance that you could create a new issue in JIRA, and upload a small 
sample file that causes the error? (Ideally the smallest file you can 
create that gives the problem)

Cheers
Nick