You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Pavel Micka (JIRA)" <ji...@apache.org> on 2015/05/20 09:51:59 UTC

[jira] [Created] (TIKA-1632) ZLIB magic detection support

Pavel Micka created TIKA-1632:
---------------------------------

             Summary: ZLIB magic detection support
                 Key: TIKA-1632
                 URL: https://issues.apache.org/jira/browse/TIKA-1632
             Project: Tika
          Issue Type: Improvement
          Components: detector
            Reporter: Pavel Micka
            Priority: Minor


In our environment we encounter many compressed streams, one of them (which is currently not supported by Tika) is ZLIB. According to my sources and experience the magics that cover majority of ZLIB archives are these:

    <mime-type type="application/zlib">
        <_comment>Zlib Compressed Archive</_comment>
        <magic priority="45">
            <match value="\x78\x01" type="string" offset="0" />
            <match value="\x78\x9c" type="string" offset="0" />
            <match value="\x78\xda" type="string" offset="0" />
        </magic>
    </mime-type>

Well described here:
http://stackoverflow.com/questions/9050260/what-does-a-zlib-header-look-like
Original RFC here:
http://tools.ietf.org/html/rfc1950



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)