You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Cservenak, Tamas (JIRA)" <ji...@apache.org> on 2014/02/24 18:00:26 UTC

[jira] [Created] (TIKA-1247) Explode monolithic parsers module into smaller ones

Cservenak, Tamas created TIKA-1247:
--------------------------------------

             Summary: Explode monolithic parsers module into smaller ones
                 Key: TIKA-1247
                 URL: https://issues.apache.org/jira/browse/TIKA-1247
             Project: Tika
          Issue Type: Improvement
            Reporter: Cservenak, Tamas


Right now, there is one monolithic parsers module, that, if used in Maven, pulls in not only the whole Internet, but beyond. Also, am not certain that every use case that for example uses HTML parser needs Microsoft related parsers, etc. Make it more granular.

Proposed solution: Explode the parsers module into smaller set of modules. Let the build tool figure out what user need, for example if user using Maven adds "chm" parser as dependency, Maven will figure out the "chm" > "html" > "txt" and "tike-core" dependencies by itself, and no transitive dependency hunting (for inclusion or exclusion) is needed.

There is a PR in WIP state with ongoing work:
https://github.com/apache/tika/pull/5




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)