You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/10/26 23:13:00 UTC

[jira] [Created] (TIKA-3904) OverrideDetector doesn't work robustly with custom configs

Tim Allison created TIKA-3904:
---------------------------------

             Summary: OverrideDetector doesn't work robustly with custom configs
                 Key: TIKA-3904
                 URL: https://issues.apache.org/jira/browse/TIKA-3904
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


We added an OverrideDetector that took whatever value a user or a parser set to TikaCoreProperties.CONTENT_TYPE_USER_OVERRIDE or ikaCoreProperties.CONTENT_TYPE_PARSER_OVERRIDE.  Users sometimes want a file parsed as if it is a specific mime type, and parsers sometimes do their own detection on embedded files so there's no need to redo detection.

The current solution however, does not work if someone adds a custom detector
<detectors>
  <detector default>
  <detector custom>
</detectors

The problem is that the override is found in the default, returned as a mime and then potentially gets overwritten by the custom.

It feels like the only solution is to put the override detection in the CompositeDetector and move away from using the OverrideDetector.

In Tika 3.x, I think we need to move all data bits that are used by the parsers out of metadata objects and into the ParseContext, but that's down the road.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)