You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/11/25 16:35:40 UTC

[jira] [Commented] (TIKA-790) Reduce duplication between POIFSDocumentType (in OfficeParser) and POIFSContainerDetector

    [ https://issues.apache.org/jira/browse/TIKA-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157197#comment-13157197 ] 

Nick Burch commented on TIKA-790:
---------------------------------

One possible solution to the few extra types that POIFSDocumentType has (such as Encrypted) is to add a parameter to the mimetype returned by POIFSContainerDetector, eg for an Encrypted file return "application/x-tika-msoffice; format=encrypted"
                
> Reduce duplication between POIFSDocumentType (in OfficeParser) and POIFSContainerDetector
> -----------------------------------------------------------------------------------------
>
>                 Key: TIKA-790
>                 URL: https://issues.apache.org/jira/browse/TIKA-790
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>
> For historical reasons, we now have two parts of Tika that handle trying to identify the type of an OLE2 based file.
> POIFSDocumentType is able to detect a few kinds of files that POIFSContainerDetector is not able to (eg Encrypted and OLE Native), mostly which may not map well onto mimetypes. POIFSDocumentType also lacks some of the logic in the main detector, and only does the office parser supported files
> We should probably try to reduce the duplication. One option is to add the extra few types into the Detector some how, the other is to use the detector first and do additional specific checks after

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira