You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Brian Jackson (JIRA)" <ji...@apache.org> on 2019/03/28 19:32:00 UTC

[jira] [Commented] (TIKA-1522) Exe being detected as application/x-msdownload

    [ https://issues.apache.org/jira/browse/TIKA-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804228#comment-16804228 ] 

Brian Jackson commented on TIKA-1522:
-------------------------------------

I just ran into this same issue, I had an exe coming back as {{application/x-msdownload; format=pe}} when I was expecting {{application/x-dosexec}}, similar to the attached exe. I'm not sure what other mime-tools out there do, but if you check the exe at [https://htmlstrip.com/mime-file-type-checker] it comes back with {{application/x-dosexec}}; they say they don't just look at the file extension, but who knows.

Ultimately this was only an issue because we are using Tika to determine if files are what they say they are (if you are uploading a txt file, is it truthfully a text file?). I was going to use detect to detect the mime / media type, then try to use the {{MimeType}} {{getExtensions()}} functionality and cross reference that with the extension of the file to determine if the file extension was within the extensions of the detected mime type. This would cause problems validating certain exes, since the exe extension would not be in the MimeType's extensions or any of the extensions of mime supertypes. I wonder if the original suggestion on the ticket of _*.exe must be included in application/x-msdownload glob pattern_ may solve this, because then it would be understood that exe is a valid extension of {{application/x-msdownload}}.

> Exe being detected as application/x-msdownload
> ----------------------------------------------
>
>                 Key: TIKA-1522
>                 URL: https://issues.apache.org/jira/browse/TIKA-1522
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.7
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>         Attachments: Search.exe
>
>
> If it is ok, *.exe must be included in application/x-msdownload glob pattern definitions. If it should be detected as application/x-dosexec, the hierarchy between application/x-dosexec, application/x-msdownload and PE based formats must be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)