You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Alex Ott (Updated) (JIRA)" <ji...@apache.org> on 2011/11/07 11:19:51 UTC

[jira] [Updated] (TIKA-697) Tika reports the content type of AR archives as "text/plain"

     [ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Ott updated TIKA-697:
--------------------------

    Attachment: tika-697.diff

This patch adds signature for Unix Archive files (.a)

I think, that signature for .deb files should be also updated accordingly
                
> Tika reports the content type of AR archives as "text/plain"
> ------------------------------------------------------------
>
>                 Key: TIKA-697
>                 URL: https://issues.apache.org/jira/browse/TIKA-697
>             Project: Tika
>          Issue Type: Bug
>         Environment: Linux (CentOS 5.6)
>            Reporter: PNS
>            Priority: Trivial
>         Attachments: tika-697.diff
>
>
> The Tika.detect(InputStream) method returns "text/plain" for AR archives created with the Linux "Create Archive" option of Nautilus (available via right-clicking on a file).
> The Apache Commons Compress "autodetection" code of the ArchiveStreamFactory looks at the first 12 bytes of the stream and correctly identifies the type as AR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira