You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Marco Quaranta (JIRA)" <ji...@apache.org> on 2013/12/09 10:22:08 UTC

[jira] [Updated] (TIKA-1204) DWFX files detection

     [ https://issues.apache.org/jira/browse/TIKA-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Quaranta updated TIKA-1204:
---------------------------------

    Attachment: General assembly filter.dwfx

DWFX test file

> DWFX files detection
> --------------------
>
>                 Key: TIKA-1204
>                 URL: https://issues.apache.org/jira/browse/TIKA-1204
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, mime
>    Affects Versions: 1.4
>            Reporter: Marco Quaranta
>            Priority: Minor
>         Attachments: General assembly filter.dwfx
>
>
> DWFX are AutoCAD [Design web format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open Packaging Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions]. 
> Tika "correctly" detects these files as application/zip. 
> It would be better if Tika could recognize the true mimetype: model/vnd.dwfx+xps. (y)
> Please add logic in ZipContainerDetector in such a way could be possible to detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage pkg): 
> {noformat}
> PackageRelationshipCollection core = pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence");
> if (core.size() != 1) {
>  // Invalid DWFX Package received
>  return null;
> }
> PackagePart corePart = pkg.getPart(core.getRelationship(0));
> String coreType = corePart.getContentType();
> return MediaType.parse(coreType);
> {noformat}
> Thank you,
> Marco



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)