You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/10/20 13:22:00 UTC

[jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration

    [ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621103#comment-17621103 ] 

Tim Allison commented on TIKA-1508:
-----------------------------------

I think we're basically good with this in 2.x now.  We still need to add generic serialization of the params for parsers (and detectors?).  I'm going to start working on this now.

Down the road, we can consolidate how we're configuring params, but we can live with the hodgepodge we have for now.

> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
>                 Key: TIKA-1508
>                 URL: https://issues.apache.org/jira/browse/TIKA-1508
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Chris A. Mattmann
>            Priority: Major
>             Fix For: 1.14
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser, it would be great if we could specify parser parameters in the main config file, something along the lines of this:
> {noformat}
>     <parser class="org.apache.tika.parser.audio.AudioParser">
>       <params>
>         <int name="someparam1">2</int>
>         <str name="someOtherParam2">something or other</str>
>       </params>
>       <mime>audio/basic</mime>
>       <mime>audio/x-aiff</mime>
>       <mime>audio/x-wav</mime>
>     </parser>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)