You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2017/07/01 20:11:03 UTC

[jira] [Commented] (TIKA-2409) Tar has different mime type by name vs contents

    [ https://issues.apache.org/jira/browse/TIKA-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071387#comment-16071387 ] 

Nick Burch commented on TIKA-2409:
----------------------------------

This is as expected. GTar is a specialisation of tar. Not all tar files are gtar files, but all gtar files are tar files

This is why you need to give Tika the contents for the most reliable detection, by name you may only end up with a more generic/common type

> Tar has different mime type by name vs contents
> -----------------------------------------------
>
>                 Key: TIKA-2409
>                 URL: https://issues.apache.org/jira/browse/TIKA-2409
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>            Reporter: Collin Peters
>
> [TestMimeTypes.java#L360|https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java#L360] has the following:
> {code}
> assertTypeByName("application/x-tar",  "test.tar");
> assertTypeByData("application/x-gtar",  "test-documents.tar"); // GNU TAR
> {code}
> The {{tar}} extension is detected as a {{application/x-tar}} by name, but a {{application-x-gtar}} by contents. This doesn't seem to match up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)