You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2018/06/15 14:22:04 UTC

[jira] [Commented] (TIKA-2663) Allow nested decorations for the default parser

    [ https://issues.apache.org/jira/browse/TIKA-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513876#comment-16513876 ] 

Nick Burch commented on TIKA-2663:
----------------------------------

I think we'd need a new section for this, since we have explicit logic to prevent you configuring the auto detect parser directly. (The view has always been that you use the config to pick the real parsers, then wrap it with autodetect if needed)

Would it be better to have an {{AutoDetectParserFactory}} or {{Builder}} with a bunch of nice options / methods on it, to let people select these things they might want? Are there any other cases other than {{AutoDetectParser}} where you might want to enable the same kinds of things?

> Allow nested decorations for the default parser
> -----------------------------------------------
>
>                 Key: TIKA-2663
>                 URL: https://issues.apache.org/jira/browse/TIKA-2663
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> I'm not sure what the cleanest solution is, but it would be nice to specify decorations on the AutoDetectParser somehow in tika-config.xml.
> For example, I want the AutoDetectParser, but wrap it in a ForkParser.  Or, I want the AutoDetectParser, but wrap it in a DigestingParser, then a RecursiveParserWrapper, then a ForkParser.
> These types of decorations feel fundamentally different to me than our current decorations which focus on child parsers.  I've done some really ugly things to get this functionality for tika-app and tika-batch, and it would be useful to clean this up.  Any ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)