You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2014/02/25 00:18:19 UTC

[jira] [Commented] (TIKA-1247) Explode monolithic parsers module into smaller ones

    [ https://issues.apache.org/jira/browse/TIKA-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910959#comment-13910959 ] 

Ken Krugler commented on TIKA-1247:
-----------------------------------

There was a very lengthy discussion of this a few years ago, some of which was captured by TIKA-686.

Unfortunately there was no clear consensus on a way forward (all solutions had various problems or concerns) so Jukka resolved TIKA-686 as "Won't Fix".

> Explode monolithic parsers module into smaller ones
> ---------------------------------------------------
>
>                 Key: TIKA-1247
>                 URL: https://issues.apache.org/jira/browse/TIKA-1247
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Cservenak, Tamas
>
> Right now, there is one monolithic parsers module, that, if used in Maven, pulls in not only the whole Internet, but beyond. Also, am not certain that every use case that for example uses HTML parser needs Microsoft related parsers, etc. Make it more granular.
> Proposed solution: Explode the parsers module into smaller set of modules. Let the build tool figure out what user need, for example if user using Maven adds "chm" parser as dependency, Maven will figure out the "chm" > "html" > "txt" and "tike-core" dependencies by itself, and no transitive dependency hunting (for inclusion or exclusion) is needed.
> There is a PR in WIP state with ongoing work:
> https://github.com/apache/tika/pull/5



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)