You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Boris Naguet (JIRA)" <ji...@apache.org> on 2013/10/02 17:48:44 UTC
[jira] [Created] (TIKA-1177) Add Matroska (mkv, mka) format
detection
Boris Naguet created TIKA-1177:
----------------------------------
Summary: Add Matroska (mkv, mka) format detection
Key: TIKA-1177
URL: https://issues.apache.org/jira/browse/TIKA-1177
Project: Tika
Issue Type: Improvement
Components: mime
Affects Versions: 1.4
Reporter: Boris Naguet
Priority: Minor
There's no mimetype detection for Matroska format, although it's a popular video format.
Here is some code I added in my custom mimetypes to detect them:
{code}
<mime-type type="video/x-matroska">
<glob pattern="*.mkv" />
<magic priority="40">
<match value="0x1A45DFA3934282886d6174726f736b61" type="string" offset="0" />
</magic>
</mime-type>
<mime-type type="audio/x-matroska">
<glob pattern="*.mka" />
</mime-type>
{code}
I found the signature for the mkv on:
http://www.garykessler.net/library/file_sigs.html
I was not able to find it clearly for mka, but detection by filename is still useful.
Although, the full spec is available here:
http://matroska.org/technical/specs/index.html
Maybe it's a bit more complex than this constant magic, but it works on my tests files.
--
This message was sent by Atlassian JIRA
(v6.1#6144)