You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "martin k. (Jira)" <ji...@apache.org> on 2023/11/22 20:05:00 UTC
[jira] [Created] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename
martin k. created TIKA-4172:
-------------------------------
Summary: Apple binary file incorrectly identified as text/x-sql due to filename
Key: TIKA-4172
URL: https://issues.apache.org/jira/browse/TIKA-4172
Project: Tika
Issue Type: Bug
Components: general
Affects Versions: 2.9.1
Reporter: martin k.
This is related to [https://github.com/eikek/docspell/issues/2376] and [https://github.com/eikek/docspell/issues/2403.]
Take the following Base64 encoding of a binary Apple-generated file. No idea what it does. You can get the file by piping the following to e.g. {{base64 -d > something.sql}}
{code:java}
ABRkMDEwMWM2Nl9teVNRTDQwLnNxbAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbUJJTgAA AAAAAAAAAAAAAAAAAACCgf+/AAA=
{code}
If this file is name {{{}something.sql{}}}, then Tika will classify it as {{{}text/x-sql{}}}, which it is not. It seems like more weight is given to the filename (extension) than the fact that the file is binary anyway.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)