You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/02/26 22:40:00 UTC
[jira] [Commented] (TIKA-2576) Add application/zstd detection and
parser
[ https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377727#comment-16377727 ]
Tim Allison commented on TIKA-2576:
-----------------------------------
This is great. Thank you for opening this issue.
Fellow devs, the dependency is BSD-2, which is good, but it contains 3MB of native libs. Should we add this dependency as provided, like we do with xerial's sqlite?
> Add application/zstd detection and parser
> -----------------------------------------
>
> Key: TIKA-2576
> URL: https://issues.apache.org/jira/browse/TIKA-2576
> Project: Tika
> Issue Type: Improvement
> Components: detector, parser
> Reporter: Andreas Meier
> Priority: Minor
> Attachments: huffman-compressed-larger, huffmann-compressed-larger-result.txt
>
>
> The IETF is currently checking the specification of Zstandard compression and the application/zstd Media Type: [https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html|https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html]
> As soon as the MediaType application/zstd is set as standard the Media Type shall be implemented.
> Possible mime-detection for tika-mimetypes.xml (second comment has to be changed when the standard is final):
> {code:xml}
> <mime-type type="application/zstd">
> <_comment>https://en.wikipedia.org/wiki/Zstandard</_comment>
> <_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment>
> <magic priority="50">
> <match value="0xFD2FB528" type="little32" offset="0"/>
> </magic>
> <glob pattern="*.zstd"/>
> </mime-type>
> {code}
> commons-compress version 1.16 and later provide a compressor and decompressor for the algorithm, based on com.github.luben zstd-jni [https://github.com/luben/zstd-jni|https://github.com/luben/zstd-jni]
> Attached sampe zstd file (huffman-compressed-larger) and the result after decompressing it.
> Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3
> {code:xml}
> <dependency>
> <groupId>org.apache.commons</groupId>
> <artifactId>commons-compress</artifactId>
> <version>1.16.1</version>
> </dependency>
> <dependency>
> <groupId>com.github.luben</groupId>
> <artifactId>zstd-jni</artifactId>
> <version>1.3.3-3</version>
> </dependency>
> {code}
> Regards
> Andreas
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)