You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Giorgiana Ciobanu (Jira)" <ji...@apache.org> on 2022/07/05 09:57:00 UTC
[jira] [Created] (TIKA-3810) Vtt file (encoding UTF-8 with BOM) seen as text/plain
Giorgiana Ciobanu created TIKA-3810:
---------------------------------------
Summary: Vtt file (encoding UTF-8 with BOM) seen as text/plain
Key: TIKA-3810
URL: https://issues.apache.org/jira/browse/TIKA-3810
Project: Tika
Issue Type: Bug
Components: core, detector, mime
Affects Versions: 2.3.0
Reporter: Giorgiana Ciobanu
Attachments: s5_windowEncoding_validFormat.vtt
Vtt file created on Windows (UTF-8 {+}with BOM{+}) is incorrectly detected as _text/plain_ type and it should be _text/vtt_ .
The application using Tika and where the file is uploaded for mime type detection is an Unix machine.
The vtt file is passed as inputstream to the Tika's default detector (we don't want to detect mime type by the file extension).
Please find attached the vtt file that Tika is detecting as text/plain .
--
This message was sent by Atlassian Jira
(v8.20.10#820010)