You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/10/05 18:32:42 UTC
[jira] [Commented] (TIKA-1177) Add Matroska (mkv, mka) format
detection
[ https://issues.apache.org/jira/browse/TIKA-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787254#comment-13787254 ]
Nick Burch commented on TIKA-1177:
----------------------------------
Matroska is a container format, so as with things like OLE2 / Ogg / Zip, we can't do fully accurate detection with just magic types.
I've add a common parent type in r1529477, which has allowed for more of the tests to work. For full detection, we need a custom detector, see TIKA-1180
> Add Matroska (mkv, mka) format detection
> ----------------------------------------
>
> Key: TIKA-1177
> URL: https://issues.apache.org/jira/browse/TIKA-1177
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.4
> Reporter: Boris Naguet
> Assignee: Ray Gauss II
> Priority: Minor
> Fix For: 1.5
>
>
> There's no mimetype detection for Matroska format, although it's a popular video format.
> Here is some code I added in my custom mimetypes to detect them:
> {code}
> <mime-type type="video/x-matroska">
> <glob pattern="*.mkv" />
> <magic priority="40">
> <match value="0x1A45DFA3934282886d6174726f736b61" type="string" offset="0" />
> </magic>
> </mime-type>
> <mime-type type="audio/x-matroska">
> <glob pattern="*.mka" />
> </mime-type>
> {code}
> I found the signature for the mkv on:
> http://www.garykessler.net/library/file_sigs.html
> I was not able to find it clearly for mka, but detection by filename is still useful.
> Although, the full spec is available here:
> http://matroska.org/technical/specs/index.html
> Maybe it's a bit more complex than this constant magic, but it works on my tests files.
--
This message was sent by Atlassian JIRA
(v6.1#6144)