You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/05/06 04:52:03 UTC

[jira] [Created] (TIKA-655) IWorkPackageParser / IWorkParser not registering properly

IWorkPackageParser / IWorkParser not registering properly
---------------------------------------------------------

                 Key: TIKA-655
                 URL: https://issues.apache.org/jira/browse/TIKA-655
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.9
            Reporter: Nick Burch
            Assignee: Nick Burch


If you try to use AutoDetectParser to handle an iWork document, it'll fail with:
 org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)

However IWorkPackageParser works fine. It seems the IWorkParser needs just the individual zip part, but is registered as the handler for the individual mime types, so breaks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (TIKA-655) IWorkPackageParser / IWorkParser not registering properly

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-655.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

> IWorkPackageParser / IWorkParser not registering properly
> ---------------------------------------------------------
>
>                 Key: TIKA-655
>                 URL: https://issues.apache.org/jira/browse/TIKA-655
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> If you try to use AutoDetectParser to handle an iWork document, it'll fail with:
>  org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
> However IWorkPackageParser works fine. It seems the IWorkParser needs just the individual zip part, but is registered as the handler for the individual mime types, so breaks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-655) IWorkPackageParser / IWorkParser not registering properly

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029720#comment-13029720 ] 

Nick Burch commented on TIKA-655:
---------------------------------

In r1100039, I've pushed the iWorks detection logic from ZipContainerDetector to IWorkPackageParser, and made that detect similar to OfficeParser does.

Then, put the content handler selection logic into IWorkPackageParser, and remove IWorkParser (which claimed to be a regular parser but in fact only worked when called from IWorkPackageParser). The result is that tika app can then parse iWork files, and unit tests still work


> IWorkPackageParser / IWorkParser not registering properly
> ---------------------------------------------------------
>
>                 Key: TIKA-655
>                 URL: https://issues.apache.org/jira/browse/TIKA-655
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> If you try to use AutoDetectParser to handle an iWork document, it'll fail with:
>  org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
> However IWorkPackageParser works fine. It seems the IWorkParser needs just the individual zip part, but is registered as the handler for the individual mime types, so breaks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira