You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/05/11 00:11:20 UTC

[jira] [Commented] (TIKA-1204) DWFX files detection

    [ https://issues.apache.org/jira/browse/TIKA-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992896#comment-13992896 ] 

Nick Burch commented on TIKA-1204:
----------------------------------

Any chance of a much smaller sample DWFX file? The one supplied is a little larger than we generally like for unit testing against

> DWFX files detection
> --------------------
>
>                 Key: TIKA-1204
>                 URL: https://issues.apache.org/jira/browse/TIKA-1204
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, mime
>    Affects Versions: 1.4
>            Reporter: Marco Quaranta
>            Priority: Minor
>         Attachments: General assembly filter.dwfx
>
>
> DWFX are AutoCAD [Design web format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open Packaging Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions]. 
> Tika "correctly" detects these files as application/zip. 
> It would be better if Tika could recognize the true mimetype: model/vnd.dwfx+xps. (y)
> Please add logic in ZipContainerDetector in such a way could be possible to detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage pkg): 
> {noformat}
> PackageRelationshipCollection core = pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence");
> if (core.size() != 1) {
>  // Invalid DWFX Package received
>  return null;
> }
> PackagePart corePart = pkg.getPart(core.getRelationship(0));
> String coreType = corePart.getContentType();
> return MediaType.parse(coreType);
> {noformat}
> Thank you,
> Marco



--
This message was sent by Atlassian JIRA
(v6.2#6252)