You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Giorgiana Ciobanu (Jira)" <ji...@apache.org> on 2022/07/05 09:57:00 UTC

[jira] [Created] (TIKA-3810) Vtt file (encoding UTF-8 with BOM) seen as text/plain

Giorgiana Ciobanu created TIKA-3810:
---------------------------------------

             Summary: Vtt file (encoding UTF-8 with BOM) seen as text/plain
                 Key: TIKA-3810
                 URL: https://issues.apache.org/jira/browse/TIKA-3810
             Project: Tika
          Issue Type: Bug
          Components: core, detector, mime
    Affects Versions: 2.3.0
            Reporter: Giorgiana Ciobanu
         Attachments: s5_windowEncoding_validFormat.vtt

Vtt file created on Windows (UTF-8 {+}with BOM{+}) is incorrectly detected as _text/plain_ type and it should be _text/vtt_ .

The application using Tika and where the file is uploaded for mime type detection is an Unix machine. 

The vtt file is passed as inputstream to the Tika's default detector (we don't want to detect mime type by the file extension).

Please find attached the vtt file that Tika is detecting as text/plain .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)