You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Matt Ryan (JIRA)" <ji...@apache.org> on 2019/06/17 17:06:01 UTC

[jira] [Resolved] (OAK-7996) Ability to disable automatic text extraction via configuration

     [ https://issues.apache.org/jira/browse/OAK-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Ryan resolved OAK-7996.
----------------------------
    Resolution: Won't Fix

So far not a strong enough argument to avoid just doing this via Tika config.

> Ability to disable automatic text extraction via configuration
> --------------------------------------------------------------
>
>                 Key: OAK-7996
>                 URL: https://issues.apache.org/jira/browse/OAK-7996
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> This issue is to discuss allowing a user to disable automatic text extraction of binary data via a configuration file.
> Currently you can save a tika.config file inside an index definition, which overrides the default Tika configuration for that index.  You can use this approach to disable automatic text extraction.
> I'd like to be able to do this at a global level - not per-index - via a configuration file instead.  Then inside the document maker code somewhere, we would check to see whether the candidate for text extraction has been disabled by configuration.
> The value in this approach is that two instances can be identical in terms of index definitions, only differing in local configuration.  Separate index definitions don't have to be maintained.  And if you want to change which files you extract text, you don't have to refresh an index to make it happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)