You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Luís Filipe Nassif (Jira)" <ji...@apache.org> on 2023/08/10 14:52:00 UTC

[jira] [Updated] (TIKA-1180) Better Matroska MKV and WEBM Detection

     [ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luís Filipe Nassif updated TIKA-1180:
-------------------------------------
    Summary: Better Matroska MKV and WEBM Detection  (was: Matroska (mkv, mka, webm) Detector)

> Better Matroska MKV and WEBM Detection
> --------------------------------------
>
>                 Key: TIKA-1180
>                 URL: https://issues.apache.org/jira/browse/TIKA-1180
>             Project: Tika
>          Issue Type: New Feature
>          Components: detector
>    Affects Versions: 1.5
>            Reporter: Nick Burch
>            Priority: Major
>              Labels: new-parser
>         Attachments: sample-mkv.noext, sample-webm.noext
>
>
> Following the work on TIKA-1177, we now have mimetype entries for the various formats which are based on the Matroska container (mkv, mka, webm etc). However, we are unable to properly identify the specific type just from some mime magic
> Instead, for fully accurate detection, we'll need a new Detector for the Matroska family, which does some very simple container/stream processing to work out what the container contains



--
This message was sent by Atlassian Jira
(v8.20.10#820010)