You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2024/04/26 13:33:00 UTC

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

    [ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841242#comment-17841242 ] 

Tim Allison edited comment on TIKA-4243 at 4/26/24 1:32 PM:
------------------------------------------------------------

I really, really want to clean up our configuration, and moving to JSON makes sense. 

I agree we need to support the legacy config of 2.x in 3.x.

Is there a reason not to use plain old Jackson databind? What does jsonschema2pojo buy us?

Will this new capability live in tika-serialization?

It will be great to convert these config objects to Records in Java 17, er Tika 4.x?

Would this allow us to get rid of our, ahem, baroque config processing code and still read 2.x configs?  I admit responsibility for the baroque config stuff, and I would really appreciate the opportunity to get rid of it asap... as long as we have backwards compatibility.

Thank you [~ndipiazza]!


was (Author: tallison@mitre.org):
I really, really want to clean up our configuration, and moving to JSON makes sense. 

I agree we need to support the legacy config of 2.x in 3.x.

Is there a reason not to use plain old Jackson databind? What does jsonschema2pojo buy us?

Will this new capability live in tika-serialization?

It will be great to convert these config objects to Records in Java 17, er Tika 4.x?

Thank you [~ndipiazza]!

> tika configuration overhaul
> ---------------------------
>
>                 Key: TIKA-4243
>                 URL: https://issues.apache.org/jira/browse/TIKA-4243
>             Project: Tika
>          Issue Type: New Feature
>          Components: config
>    Affects Versions: 3.0.0
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>
> In 3.0.0 when dealing with Tika, it would greatly help to have a Typed Configuration schema. 
> In 3.x can we remove the old way of doing configs and replace with Json Schema?
> Json Schema can be converted to Pojos using a maven plugin [https://github.com/joelittlejohn/jsonschema2pojo]
> This automatically creates a Java Pojo model we can use for the configs. 
> This can allow for the legacy tika-config XML to be read and converted to the new pojos easily using an XML mapper so that users don't have to use JSON configurations yet if they do not want.
> When complete, configurations can be set as XML, JSON or YAML
> tika-config.xml
> tika-config.json
> tika-config.yaml
> Replace all instances of tika config annotations that used the old syntax, and replace with the Pojo model serialized from the xml/json/yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)