You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/01/22 19:48:35 UTC

[jira] [Created] (TIKA-1528) Add an OverrideDetector that overrides other detectors

Tim Allison created TIKA-1528:
---------------------------------

             Summary: Add an OverrideDetector that overrides other detectors
                 Key: TIKA-1528
                 URL: https://issues.apache.org/jira/browse/TIKA-1528
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison
            Priority: Minor


While working on TIKA-1511, I found a need to bypass our current detection mechanism.  I think that there are other use cases for this.  The idea is that a client or a tika-internal call wants to specify the Content-Type for a document and bypass the regular mime detection chain.

We currently have the TypeDetector that returns the "Content-Type" as specified in the Metadata, but there are two deficiencies in using that class for this purpose:
* Content-Type is ambiguous, currently, when it comes into a Parser or Detector, it could be used as a hint or as a direction.  I'd like the OverrideDetector to use a different metadata key from our usual "Content-Type.
* The ordering of the TypeDetector is based on alphabetic order of its class name.  I'd like the OverrideDetector to be run first and then short circuit/bypass the other detectors.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)