You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Boris Naguet (JIRA)" <ji...@apache.org> on 2013/10/02 17:48:44 UTC

[jira] [Created] (TIKA-1177) Add Matroska (mkv, mka) format detection

Boris Naguet created TIKA-1177:
----------------------------------

             Summary: Add Matroska (mkv, mka) format detection
                 Key: TIKA-1177
                 URL: https://issues.apache.org/jira/browse/TIKA-1177
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 1.4
            Reporter: Boris Naguet
            Priority: Minor


There's no mimetype detection for Matroska format, although it's a popular video format.
Here is some code I added in my custom mimetypes to detect them:

{code}
	<mime-type type="video/x-matroska">
		<glob pattern="*.mkv" />
		<magic priority="40">
			<match value="0x1A45DFA3934282886d6174726f736b61" type="string" offset="0" />
		</magic>
	</mime-type>
	<mime-type type="audio/x-matroska">
		<glob pattern="*.mka" />
	</mime-type>
{code}
I found the signature for the mkv on: 
http://www.garykessler.net/library/file_sigs.html
I was not able to find it clearly for mka, but detection by filename is still useful.

Although, the full spec is available here:
http://matroska.org/technical/specs/index.html
Maybe it's a bit more complex than this constant magic, but it works on my tests files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)